Wikidata:WikiProject Collection highlights National Library of the Netherlands/Flora Batava/Machine reuse
This page is part of the Wikiproject Collection highlights National Library of the Netherlands, subproject Flora Batava. This project is of the Wikidata efforts of the Koninklijke Bibliotheek Nederland
General and structural overviews | Botanical overviews | SPARQL example queries | Machine & programmatical reuse | Data models, quality & completeness | All pages |
Machine & programmatical reuse edit
This page aims to give guidance and inspiration for users who want to set up prorammatical interactions with the data and images of the Flora Batava.
See also edit
- The overview of SPARQL example queries related to Flora Batava
- Examples of how you can programmatically reuse other collection highlights of the KB, for instance for/in your own websites, services, apps, hackathons and projects. This pages discusses SPARQL, APIs, Python scripts, JSON, XML, image bulk downloading and machine interactions with KB's highlights.
Wikidata SPARQL results as JSON/XML -- TODO edit
- JSON
All plates, all volumes, inc images and volumes, in JSON
- Dutch WP ariclea about depicte plant species, as JSON
2) Wikidata action api:
Wikidata rest api: https://doc.wikimedia.org/Wikibase/master/js/rest-api/
- XML https://query.wikidata.org/sparql?query=SELECT%20DISTINCT%20%3Fitem%20%3FitemLabel%20%3FitemDescription%0AWHERE%20%0A{%0A%20%20wd%3AQ117860156%20wdt%3AP527%20%3Fitem.%0A%20%20SERVICE%20wikibase%3Alabel%20{%20bd%3AserviceParam%20wikibase%3Alanguage%20%22[AUTO_LANGUAGE]%2Cen%22.%20}%20%0A}%0AORDER%20BY%20%3FitemLabel&format=xml
Requesting Wikidata items in machine readable output formats (JSON, XML etc.) edit
Full Qitems as JSON can be requested in 3 ways, for example Plate 0041, Flora Batava (KB), volume 1 (Q118398291)
- Via Special:EntityData: https://www.wikidata.org/wiki/Special:EntityData/Q118398291.json
- Via the Wikidata Action API: https://www.wikidata.org/w/api.php?action=wbgetentities&ids=Q118398291&format=json
- Via the Wikidata REST API: https://wikidata.org/w/rest.php/wikibase/v0/entities/items/Q118398291
1) Via Special:EntityData edit
- See https://www.mediawiki.org/wiki/Wikibase/EntityData + https://www.wikidata.org/wiki/Special:EntityData
- Requesting the URL https://www.wikidata.org/wiki/Special:EntityData/Q118398291 for the Wikidata item Plate 0041, Flora Batava (KB), volume 1 (Q118398291) uses Special:EntityData and content negotiation to return HTML in your browser.
- If you don’t want to depend on content negotiation (e.g. view non-HTML content in a web browser), you can actively request alternative formats by appendig a format suffix to the URL, eg. to retrieve JSON: Special:EntityData/Q118398291.json, or equivalantly using the format argument, e.g. Special:EntityData?id=Q118398291&format=json.
- Other available formats are JSON-LD, RDF, NT, TTL or N3 and PHP, eg. Special:EntityData/Q118398291.nt
- It also works on Wikimedia Commons, eg. Special:EntityData/M132042368.rdf
Python implementation edit
import requests
import json
baseurl = "https://www.wikidata.org/wiki/Special:EntityData/"
qnumbers = ['Q118398291'] # Plate 0041, Flora Batava (KB), volume 1
for qnum in qnumbers:
url = baseurl + qnum + '.json'
headers = {
'Accept': 'application/json',
'User-Agent': 'User:OlafJanssen - olaf.janssen@kb.nl'
}
r = requests.get(url, headers=headers)
data = json.loads(r.text)
print(data)
2) Wikidata Action API edit
- See https://www.wikidata.org/w/api.php + https://www.wikidata.org/wiki/Wikidata:Data_access#MediaWiki_Action_API
- Full item: https://www.wikidata.org/w/api.php?action=wbgetentities&ids=Q118398291&format=json (JSON data about Plate 0041, Flora Batava (KB), volume 1 (Q118398291))
- All labels: https://www.wikidata.org/w/api.php?action=wbgetentities&ids=Q118398291&props=labels&format=xml (as XML)
- Labels in Dutch (nl): https://www.wikidata.org/w/api.php?action=wbgetentities&ids=Q118398291&props=labels&languages=nl
-
- https://www.wikidata.org/w/api.php?action=wbgetclaims&entity=Q118398291&format=json (JSON)
- https://www.wikidata.org/w/api.php?action=wbgetclaims&entity=Q118398291&format=xml (XML)
- https://www.wikidata.org/w/api.php?action=wbgetclaims&entity=Q118398291&property=P180 (depicts (P180) = Bromus sterilis (Q159146))
3) Wikidata REST API edit
- See https://www.wikidata.org/wiki/Wikidata:REST_API + https://doc.wikimedia.org/Wikibase/master/js/rest-api/
- Main advantage: cleaner, flatter structure in JSON response data, compared to the other 2 methods above
- Examples:
- Full item: https://wikidata.org/w/rest.php/wikibase/v0/entities/items/Q118398291 (JSON data about Plate 0041, Flora Batava (KB), volume 1 (Q118398291))
- All labels: https://wikidata.org/w/rest.php/wikibase/v0/entities/items/Q118398291/labels
- Labels in Dutch (nl): https://wikidata.org/w/rest.php/wikibase/v0/entities/items/Q118398291/labels/nl
- Things decpicted on this plate: https://www.wikidata.org/w/rest.php/wikibase/v0/entities/items/Q118398291/statements/Q118398291$A131F012-737B-40F2-B172-66865CF9714C : depicts (P180) = Bromus sterilis (Q159146) (the statement id for P180 can be found from https://wikidata.org/w/rest.php/wikibase/v0/entities/items/Q118398291/statements)
API calls on Wikimedia Commons -- TODO edit
Get all files (namepspace=6) and their image URLs in Category:Flora Batava - KB copy, Volume 1, 1800 (see also this table how to get them via SPARQL)
- JSON in pretty-print in HTML: https://commons.wikimedia.org/w/api.php?action=query&generator=categorymembers&gcmtitle=Category:Flora%20Batava%20-%20KB%20copy,%20Volume%201,%201800&gcmnamespace=6&gcmlimit=500&prop=imageinfo&iiprop=url&format=jsonfm (or XML, see list of available formats).
- JSON raw: https://commons.wikimedia.org/w/api.php?action=query&list=categorymembers&cmtitle=Category:Flora%20Batava%20-%20KB%20copy,%20Volume%201,%201800&cmnamespace=6&cmlimit=500&format=json (or XML).
- See relevant API documentation
Via https://api.wikimedia.org/wiki/Main_Page edit
- See https://api.wikimedia.org/wiki/Reusing_free_images_and_media_files_with_Python
- Request URLs of various resolutions of Plate 0041: https://api.wikimedia.org/core/v1/commons/file/File:Bromus_sterilis_(modern=Anisantha_sterilis)_-_Pl0041_-_FloraBatava-KB-v01.jpg
Python code examples edit
Get all files and their URLs from Flora Batava Volume 2 edit
using Pythons scripts, with 3 variations.
Example: for File:Agrostis Spica venti (modern=Apera spica-venti) - Pl0151 - FloraBatava-KB-v02.jpg
- MediaID = M132800954 (see this page)
- Title (as string) = File:Agrostis Spica venti (modern=Apera spica-venti) - Pl0151 - FloraBatava-KB-v02.jpg
- FullimageUrl: https://upload.wikimedia.org/wikipedia/commons/e/e8/Agrostis_Spica_venti_%28modern%3DApera_spica-venti%29_-_Pl0151_-_FloraBatava-KB-v02.jpg (via Special:Redirect)
- PageUrl: https://commons.wikimedia.org/wiki/File:Agrostis_Spica_venti_(modern%3DApera_spica-venti)_-_Pl0151_-_FloraBatava-KB-v02.jpg
- PageShortUrl: https://commons.wikimedia.org/w/index.php?curid=132800954 (or alternatively via the concept URI https://commons.wikimedia.org/entity/M132800954)
Variation 1, Commons Action API edit
# Get all files and their URLs from Flora Batava Volume 2
#!/usr/bin/python3
import requests, json
s = requests.Session()
apibaseurl = "https://commons.wikimedia.org/w/api.php"
cat = "Category:Flora_Batava_-_KB_copy,_Volume_2,_1807"
headers = {'Accept' : 'application/json', 'User-Agent': 'User OlafJanssen - ' + cat}
# https://commons.wikimedia.org/w/api.php?action=query&generator=categorymembers&gcmtitle=Category:Flora%20Batava%20-%20KB%20copy,%20Volume%201,%201800&gcmnamespace=6&gcmlimit=500&prop=imageinfo&iiprop=url&format=json
params = {
"action": "query",
"generator": "categorymembers",
"gcmtitle":cat,
"gcmlimit":"500",
"gcmnamespace": "6",
"prop":"imageinfo",
"iiprop":"url",
"format":"json",
}
r = s.get(url=apibaseurl, params=params, headers=headers)
filesdata = json.loads(r.text)
pages=list(filesdata.get('query').get('pages').values())
#header of CSV file
print('"MediaID","Title","FullimageUrl","PageUrl","PageShortUrl"')
for p in range(0, len(pages)):
pageid=pages[p].get('pageid')
title=pages[p].get('title')
fullimageurl = pages[p].get('imageinfo')[0].get('url')
pageurl = pages[p].get('imageinfo')[0].get('descriptionurl')
pageshorturl = pages[p].get('imageinfo')[0].get('descriptionshorturl')
#Body of CSV file
print('"M%s","%s","%s","%s","%s"' % (pageid,title,fullimageurl, pageurl, pageshorturl))
resulting into the CSV formatted file
"MediaID","Title","FullimageUrl","PageUrl","PageShortUrl"
"M133345691","File:Agrostis Spica venti (modern=Apera spica-venti) - Pl0151 - DescriptionFR01 - FloraBatava-KB-v02.jpg","https://upload.wikimedia.org/wikipedia/commons/d/d2/Agrostis_Spica_venti_%28modern%3DApera_spica-venti%29_-_Pl0151_-_DescriptionFR01_-_FloraBatava-KB-v02.jpg","https://commons.wikimedia.org/wiki/File:Agrostis_Spica_venti_(modern%3DApera_spica-venti)_-_Pl0151_-_DescriptionFR01_-_FloraBatava-KB-v02.jpg","https://commons.wikimedia.org/w/index.php?curid=133345691"
"M133345689","File:Agrostis Spica venti (modern=Apera spica-venti) - Pl0151 - DescriptionNL01 - FloraBatava-KB-v02.jpg","https://upload.wikimedia.org/wikipedia/commons/2/29/Agrostis_Spica_venti_%28modern%3DApera_spica-venti%29_-_Pl0151_-_DescriptionNL01_-_FloraBatava-KB-v02.jpg","https://commons.wikimedia.org/wiki/File:Agrostis_Spica_venti_(modern%3DApera_spica-venti)_-_Pl0151_-_DescriptionNL01_-_FloraBatava-KB-v02.jpg","https://commons.wikimedia.org/w/index.php?curid=133345689"
"M132800954","File:Agrostis Spica venti (modern=Apera spica-venti) - Pl0151 - FloraBatava-KB-v02.jpg","https://upload.wikimedia.org/wikipedia/commons/e/e8/Agrostis_Spica_venti_%28modern%3DApera_spica-venti%29_-_Pl0151_-_FloraBatava-KB-v02.jpg","https://commons.wikimedia.org/wiki/File:Agrostis_Spica_venti_(modern%3DApera_spica-venti)_-_Pl0151_-_FloraBatava-KB-v02.jpg","https://commons.wikimedia.org/w/index.php?curid=132800954"
....
Variation 2, manifest SPARQL query edit
- Based on the Python code on this page and this SPARQL query
- To retrieve data from the Commons Query Service via 3rd party scripts (such as the one below), we need to login with our Wikimedia creditentials. See https://commons.wikimedia.org/wiki/Commons:SPARQL_query_service/API_endpoint for details on how to programmatically access the query service, including some examples in Python.
- See for help on the query https://www.mediawiki.org/wiki/Wikidata_Query_Service/User_Manual/MWAPI
from http.cookiejar import Cookie
import os
import requests
from urllib.parse import urlparse
sparqlquery="""
# Based on the SPARQL query at https://www.wikidata.org/wiki/Wikidata:WikiProject_Collection_highlights_National_Library_of_the_Netherlands/Flora_Batava/Queries#Get_all_files_and_their_URLs_in_Volume_2
SELECT ?MediaID ?Title ?FullimageUrl1 ?FullimageUrl2 ?PageUrl ?PageShortUrl1 ?PageShortUrl2
WITH
{
SELECT ?Title ?pageid ?FullimageUrl1
WHERE
{
SERVICE wikibase:mwapi
{
bd:serviceParam wikibase:api "Generator" .
bd:serviceParam wikibase:endpoint "commons.wikimedia.org" .
bd:serviceParam mwapi:gcmtitle "Category:Flora_Batava_-_KB_copy,_Volume_2,_1807" .
bd:serviceParam mwapi:generator "categorymembers" .
bd:serviceParam mwapi:gcmlimit "500" .
bd:serviceParam mwapi:prop "imageinfo".
bd:serviceParam mwapi:iiprop "url".
?Title wikibase:apiOutput mwapi:title .
?pageid wikibase:apiOutput "@pageid" .
?ns wikibase:apiOutput "@ns".
?FullimageUrl1 wikibase:apiOutputURI "imageinfo/ii/@url".
}
FILTER (?ns = "6") #Files only
}
} AS %get_files
WHERE
{
INCLUDE %get_files
BIND(CONCAT('M', ?pageid) AS ?MediaID)
BIND(REPLACE(?Title, " ", "_", "i") AS ?p)
BIND(URI(CONCAT("https://commons.wikimedia.org/w/index.php?title=Special:Redirect/file&wpvalue=", ?p)) AS ?FullimageUrl2)
BIND(URI(CONCAT('https://commons.wikimedia.org/wiki/', ?p)) AS ?PageUrl).
BIND(URI(CONCAT('https://commons.wikimedia.org/w/index.php?curid=', ?pageid)) AS ?PageShortUrl1)
BIND(URI(CONCAT('https://commons.wikimedia.org/entity/', ?MediaID)) AS ?PageShortUrl2)
}
ORDER BY ?MediaID #approx the same order as the plate numbers
"""
# To retrieve data from the Commons Query Service via 3rd party scripts (such as this one), we need to login with our Wikimedia creditentials.
# See https://commons.wikimedia.org/wiki/Commons:SPARQL_query_service/API_endpoint
# for details on how to programmatically access the query service.
def init_session(endpoint, token):
domain = urlparse(endpoint).netloc
session = requests.Session()
session.headers.update({
'User-Agent': 'Commons Query Service example via Python by User:OlafJanssen // olaf.janssen@kb.nl',
})
session.cookies.set_cookie(Cookie(0, 'wcqsOauth', token, None, False, domain, False, False, '/', True,
False, None, True, None, None, {}))
return session
ENDPOINT = 'https://commons-query.wikimedia.org/sparql'
# session = init_session(ENDPOINT, os.environ['WCQS_AUTH_TOKEN'])
session = init_session(ENDPOINT, '07c761d6251f5e360a170b58a28d4d55.9c54c881ae4a08b154b2acb47e17d634c1e50d5b') #Change this value to match your own session, see https://commons.wikimedia.org/wiki/Commons:SPARQL_query_service/API_endpoint
response = session.post(
url=ENDPOINT,
data={'query': sparqlquery},
headers={'Accept': 'application/json'}
)
response.raise_for_status()
data=response.json().get("results", "XXXXXX").get("bindings", "XXXXX")
#header of CSV file
print('"MediaID","Title","FullimageUrl1","FullimageUrl2","PageUrl","PageShortUrl1","PageShortUrl2"')
for p in range(0, len(data)):
#print(data[p])
mediaid = data[p].get('MediaID').get('value','YYYYYYYY')
title = data[p].get('Title').get('value','YYYYYYYY')
fullimageurl1 = data[p].get('FullimageUrl1').get('value','YYYYYYYY')
fullimageurl2 = data[p].get('FullimageUrl2').get('value','YYYYYYYY')
pageurl = data[p].get('PageUrl') .get('value','YYYYYYYY')
pageshorturl1 = data[p].get('PageShortUrl1').get('value','YYYYYYYY')
pageshorturl2 = data[p].get('PageShortUrl2').get('value','YYYYYYYY')
#Body of CSV file
print('"%s","%s","%s","%s","%s","%s","%s"' % (mediaid,title,fullimageurl1,fullimageurl2, pageurl, pageshorturl1,pageshorturl2))
resulting into the CSV formatted file
"MediaID","Title","FullimageUrl1","FullimageUrl2","PageUrl","PageShortUrl1","PageShortUrl2"
"M131745248","File:Title page Flora Batava (KB), Volume 2, 1807.jpg","https://upload.wikimedia.org/wikipedia/commons/a/a4/Title_page_Flora_Batava_%28KB%29%2C_Volume_2%2C_1807.jpg","https://commons.wikimedia.org/w/index.php?title=Special:Redirect/file&wpvalue=File:Title_page_Flora_Batava_(KB),_Volume_2,_1807.jpg","https://commons.wikimedia.org/wiki/File:Title_page_Flora_Batava_(KB),_Volume_2,_1807.jpg","https://commons.wikimedia.org/w/index.php?curid=131745248","https://commons.wikimedia.org/entity/M131745248"
"M132800657","File:Veronica officinalis - Pl0081 - FloraBatava-KB-v02.jpg","https://upload.wikimedia.org/wikipedia/commons/2/23/Veronica_officinalis_-_Pl0081_-_FloraBatava-KB-v02.jpg","https://commons.wikimedia.org/w/index.php?title=Special:Redirect/file&wpvalue=File:Veronica_officinalis_-_Pl0081_-_FloraBatava-KB-v02.jpg","https://commons.wikimedia.org/wiki/File:Veronica_officinalis_-_Pl0081_-_FloraBatava-KB-v02.jpg","https://commons.wikimedia.org/w/index.php?curid=132800657","https://commons.wikimedia.org/entity/M132800657"
"M132800741","File:Glaux maritima - Pl0082 - FloraBatava-KB-v02.jpg","https://upload.wikimedia.org/wikipedia/commons/5/5a/Glaux_maritima_-_Pl0082_-_FloraBatava-KB-v02.jpg","https://commons.wikimedia.org/w/index.php?title=Special:Redirect/file&wpvalue=File:Glaux_maritima_-_Pl0082_-_FloraBatava-KB-v02.jpg","https://commons.wikimedia.org/wiki/File:Glaux_maritima_-_Pl0082_-_FloraBatava-KB-v02.jpg","https://commons.wikimedia.org/w/index.php?curid=132800741","https://commons.wikimedia.org/entity/M132800741"
....
Variation 3: as variation 2, but now via a SPARQL JSON URL edit
from http.cookiejar import Cookie
import requests
from urllib.parse import urlparse
# JSON URL rendering of the query on https://www.wikidata.org/wiki/Wikidata:WikiProject_Collection_highlights_National_Library_of_the_Netherlands/Flora_Batava/Queries#Get_all_files_and_their_URLs_in_Volume_2
sparqljsonurl = "https://commons-query.wikimedia.org/sparql?query=%23%20For%20File%3AAgrostis%20Spica%20venti%20(modern%3DApera%20spica-venti)%20-%20Pl0151%20-%20FloraBatava-KB-v02.jpg%0A%23%20-%20MediaID%20%3D%20M132800954%20(see%20%5Bhttps%3A%2F%2Fcommons.wikimedia.org%2Fw%2Findex.php%3Ftitle%3DFile%3AAgrostis_Spica_venti_(modern%253DApera_spica-venti)_-_Pl0151_-_FloraBatava-KB-v02.jpg%26action%3Dinfo%20this%20page%5D)%0A%23%20-%20Title%20(as%20string)%3D%20File%3AAgrostis%20Spica%20venti%20(modern%3DApera%20spica-venti)%20-%20Pl0151%20-%20FloraBatava-KB-v02.jpg%0A%23%20-%20FullimageUrl1%20%3D%20https%3A%2F%2Fupload.wikimedia.org%2Fwikipedia%2Fcommons%2Fe%2Fe8%2FAgrostis_Spica_venti_%2528modern%253DApera_spica-venti%2529_-_Pl0151_-_FloraBatava-KB-v02.jpg%20%0A%23%20-%20FullimageUrl2%2C%20via%20Special%3ARedirect%20%3D%20https%3A%2F%2Fcommons.wikimedia.org%2Fw%2Findex.php%3Ftitle%3DSpecial%3ARedirect%2Ffile%26wpvalue%3DFile%3AAgrostis_Spica_venti_(modern%253DApera_spica-venti)_-_Pl0151_-_FloraBatava-KB-v02.jpg%0A%23%20-%20PageUrl%20%3D%20https%3A%2F%2Fcommons.wikimedia.org%2Fwiki%2FFile%3AAgrostis_Spica_venti_(modern%253DApera_spica-venti)_-_Pl0151_-_FloraBatava-KB-v02.jpg%0A%23%20-%20PageShortUrl%20%3D%20https%3A%2F%2Fcommons.wikimedia.org%2Fw%2Findex.php%3Fcurid%3D132800954%0A%23%20-%20PageShortUrl2%2C%20via%20the%20concept%20URI%20%3D%20https%3A%2F%2Fcommons.wikimedia.org%2Fentity%2FM132800954%0A%23%0A%23%20See%20for%20help%20https%3A%2F%2Fwww.mediawiki.org%2Fwiki%2FWikidata_Query_Service%2FUser_Manual%2FMWAPI%0A%0ASELECT%20%3FMediaID%20%3FTitle%20%3FFullimageUrl1%20%3FFullimageUrl2%20%3FPageUrl%20%3FPageShortUrl1%20%3FPageShortUrl2%0AWITH%0A%7B%0A%20%20SELECT%20%3FTitle%20%3Fpageid%20%20%3FFullimageUrl1%20%0A%20%20WHERE%0A%7B%0A%20%20SERVICE%20wikibase%3Amwapi%0A%20%20%7B%0A%20%20%20%20bd%3AserviceParam%20wikibase%3Aapi%20%22Generator%22%20.%0A%20%20%20%20bd%3AserviceParam%20wikibase%3Aendpoint%20%22commons.wikimedia.org%22%20.%0A%20%20%20%20bd%3AserviceParam%20mwapi%3Agcmtitle%20%22Category%3AFlora_Batava_-_KB_copy%2C_Volume_2%2C_1807%22%20.%0A%20%20%20%20bd%3AserviceParam%20mwapi%3Agenerator%20%22categorymembers%22%20.%0A%20%20%20%20bd%3AserviceParam%20mwapi%3Agcmlimit%20%22500%22%20.%0A%20%20%20%20bd%3AserviceParam%20mwapi%3Aprop%20%22imageinfo%22.%0A%20%20%20%20bd%3AserviceParam%20mwapi%3Aiiprop%20%22url%22.%0A%20%20%20%20%3FTitle%20wikibase%3AapiOutput%20mwapi%3Atitle%20.%0A%20%20%20%20%3Fpageid%20wikibase%3AapiOutput%20%22%40pageid%22%20.%0A%20%20%20%20%3Fns%20wikibase%3AapiOutput%20%22%40ns%22.%0A%20%20%20%20%3FFullimageUrl1%20wikibase%3AapiOutputURI%20%22imageinfo%2Fii%2F%40url%22.%0A%20%20%7D%0A%20%20FILTER%20(%3Fns%20%3D%20%226%22)%20%20%23Files%20only%0A%20%20%7D%0A%7D%20AS%20%25get_files%0AWHERE%0A%7B%0A%20%20INCLUDE%20%25get_files%0A%20%20BIND(CONCAT(%27M%27%2C%20%3Fpageid)%20AS%20%3FMediaID)%20%20%20%20%0A%20%20BIND(REPLACE(%3FTitle%2C%20%22%20%22%2C%20%22_%22%2C%20%22i%22)%20AS%20%3Fp)%20%20%20%20%20%20%0A%20%20BIND(URI(CONCAT(%22https%3A%2F%2Fcommons.wikimedia.org%2Fw%2Findex.php%3Ftitle%3DSpecial%3ARedirect%2Ffile%26wpvalue%3D%22%2C%20%3Fp))%20AS%20%3FFullimageUrl2)%20%20%20%20%20%20%0A%20%20BIND(URI(CONCAT(%27https%3A%2F%2Fcommons.wikimedia.org%2Fwiki%2F%27%2C%20%3Fp))%20AS%20%3FPageUrl).%20%0A%20%20BIND(URI(CONCAT(%27https%3A%2F%2Fcommons.wikimedia.org%2Fw%2Findex.php%3Fcurid%3D%27%2C%20%3Fpageid))%20AS%20%3FPageShortUrl1)%0A%20%20BIND(URI(CONCAT(%27https%3A%2F%2Fcommons.wikimedia.org%2Fentity%2F%27%2C%20%3FMediaID))%20AS%20%3FPageShortUrl2)%0A%7D%20%0AORDER%20BY%20%3FMediaID%20%23approx%20the%20same%20order%20as%20the%20plate%20numbers&format=json"
# To retrieve data from the Commons Query Service via 3rd party scripts (such as this one), we need to login with our Wikimedia creditentials.
# See https://commons.wikimedia.org/wiki/Commons:SPARQL_query_service/API_endpoint
# for details on how to programmatically access the query service.
def init_session(endpoint, token):
domain = urlparse(endpoint).netloc
session = requests.Session()
session.headers.update({
'User-Agent': 'Commons Query Service example via Python by User:OlafJanssen // olaf.janssen@kb.nl',
})
session.cookies.set_cookie(Cookie(0, 'wcqsOauth', token, None, False, domain, False, False, '/', True,
False, None, True, None, None, {}))
return session
ENDPOINT = 'https://commons-query.wikimedia.org/sparql'
session = init_session(ENDPOINT, '07c761d6251f5e360a170b58a28d4d55.9c54c881ae4a08b154b2acb47e17d634c1e50d5b') #Change this value to match your own session, see https://commons.wikimedia.org/wiki/Commons:SPARQL_query_service/API_endpoint
response = session.post(
url=sparqljsonurl,
headers={'Accept': 'application/json'}
)
data=response.json().get("results", "XXXXXX").get("bindings", "XXXXX")
#header of CSV file
print('"MediaID","Title","FullimageUrl1","FullimageUrl2","PageUrl","PageShortUrl1","PageShortUrl2"')
for p in range(0, len(data)):
#print(data[p])
mediaid = data[p].get('MediaID').get('value','YYYYYYYY')
title = data[p].get('Title').get('value','YYYYYYYY')
fullimageurl1 = data[p].get('FullimageUrl1').get('value','YYYYYYYY')
fullimageurl2 = data[p].get('FullimageUrl2').get('value','YYYYYYYY')
pageurl = data[p].get('PageUrl').get('value','YYYYYYYY')
pageshorturl1 = data[p].get('PageShortUrl1').get('value','YYYYYYYY')
pageshorturl2 = data[p].get('PageShortUrl2').get('value','YYYYYYYY')
#Body of CSV file
print('"%s","%s","%s","%s","%s","%s","%s"' % (mediaid,title,fullimageurl1,fullimageurl2, pageurl, pageshorturl1,pageshorturl2))
resulting into the same CSV formatted data as above.
Download all full size images in Volume 2 (and save to zipfile) edit
# Download full size images from a Commons category to a folder, and optionally create zipfile from folder
# https://opendata.stackexchange.com/questions/13381/wikimedia-commons-api-image-by-category
import os
import json
import requests
import shutil
s = requests.Session()
apibaseurl = "https://commons.wikimedia.org/w/api.php"
cat = "Category:Flora_Batava_-_KB_copy,_Volume_2,_1807"
cat_stripped = cat[9:] # For name of file folder
link = "https://commons.wikimedia.org/w/api.php?action=query&generator=categorymembers&gcmtitle=" \
+ cat + "&gcmlimit=500&gcmtype=file&prop=imageinfo&iiprop=url&format=json"
headers = {'Accept' : 'application/json', 'User-Agent': 'User:OlafJanssen - olaf.janssen@kb.nl - ' + cat}
r = s.get(url=link, headers=headers)
filesdata = json.loads(r.text)
pages=list(filesdata.get('query').get('pages').values())
#for p in range(0, len(pages)): # download all files in the category - be careful, might give very big zipfile if there are many files in the cat!
for p in range(0, 4): # download selected files in the category
filetitle=pages[p].get('title').replace(" ","_")[5:] # Strip the "File:" part from the name
fullimageurl = pages[p].get('imageinfo')[0].get('url')
print('"%s","%s"' % (filetitle,fullimageurl))
# Next:
# 1) Filter only images (png,jpg,jpeg,tiff,gif), so do not include pdf files
# 2) Collect all downloaded images in output folder 'cat_stripped'
# 3) Zip that folder and write zip to disk
if not os.path.exists(cat_stripped):
os.mkdir(cat_stripped)
if not os.path.exists(os.path.join(cat_stripped,filetitle)): # Don't overwrite
if filetitle.lower().endswith(('.png', '.jpg', '.jpeg', '.tiff')):
# https://stackoverflow.com/questions/30229231/python-save-image-from-url
with open(os.path.join(cat_stripped,filetitle),'wb') as f:
# Headers are compulsory!
response = requests.get(fullimageurl, stream=True, headers={'User-Agent': 'Commons category downloader by User:OlafJanssen - olaf.janssen@kb.nl'})
if not response.ok:
print(response)
for block in response.iter_content(1024):
if not block:
break
f.write(block)
# Optionally, make zip from folder
#shutil.make_archive(cat_stripped, 'zip', cat_stripped)
To add edit
HTML page for rendering "Historical vs. modern images and distribution maps of plants" https://www.wikidata.org/wiki/Wikidata:WikiProject_Collection_highlights_National_Library_of_the_Netherlands/Flora_Batava/Queries#Historical_vs._modern_images_and_distribution_maps_of_plants Based on this query (as JSON), we use the folliwng scruiot to redner a HTML page -->
PyWikiBot edit
Include some examples on interactions via Pywikibot
See
- Homepage of Pywikibot:
- Pywikibot - Python 3 Tutorial: https://www.wikidata.org/wiki/Wikidata:Pywikibot_-_Python_3_Tutorial
- Run it in PAWS
import pywikibot
site=pywikibot.Site('nl')
page=pywikibot.Page(site,'Hugo Brandt Corstius')
if ('wikibase_item' in page.properties()):
print(page.properties()['wikibase_item'])
else:
pass #no wikidata for this page
Jupyther notebooks -- TODO edit
- To add...