Wikidata:Property proposal/Internet Encyclopedia of Ukraine ID
Internet Encyclopedia of Ukraine ID edit
Originally proposed at Wikidata:Property proposal/Authority control
Motivation edit
The Internet Encyclopedia of Ukraine is a useful reference on the subject, written by professionals, actively updated, and covering a broad range of subtopics.
It is based on the six-volume Encyclopedia of Ukraine (Q774515) (1977–85)—called the “most important reference work on Ukraine in English” by the authors of the Historical Dictionary of Ukraine (Toronto 2005, p 745)—which was in turn based on the 14-volume Ukrainian-Language Entsyklopediia ukraïnoznavstva (Munich 1949–95). It is linked 1,652 times in en.wikipedia, 598 times in de., and 231 times in uk. It appears over 200 times in Google Scholar results.
Note: an identifier in the form pages\K\Y\Kyiv
seems to work fine in my browser, but in the address bar the URL query is converted to a URL-encoded version pages%5CK%5CY%5CKyiv
. I believe this may be a function of the web browser, and I don’t know if it’s safe to use the raw text version. —Michael Z. 04:07, 2 January 2021 (UTC)
- A bit of research confirms that the backslash character “\” is unsafe in a URL, according to RFC 1738 Uniform Resource Locators (URL). It is not one of the reserved or unreserved URL characters, and so must be percent-encoded. That said, currently http://www.encyclopediaofukraine.com/display.asp?linkpath=pages\K\Y\Kyiv works for me, but we should adhere to basic responsible software practices (be liberal in what you accept, and conservative in what you produce).
- Is there a way to implement something like
http://www.encyclopediaofukraine.com/display.asp?linkpath=URLENCODE($1).htm
, to render a safe URL for a link? If not, then we must store %5C.
- Researching URI details has also made me aware that the query portion of the URL is technically
?linkpath=pages\K\Y\Kyiv.htm
, and the content of the query argumentlink path
ispages\K\Y\Kyiv.htm
, including the .htm (the path name ends with the file extension .asp, and the query starts with the question mark). So a safer, future-proof version of the ID might be the full stringpages\K\Y\Kyiv.htm
, URL encoded aspages%5CK%5CY%5CKyiv.htm
.
- On the other hand, random testing shows that in practice, the .htm is not currently required, and
pages\K\Y\Kyiv
(pages%5CK%5CY%5CKyiv
) is the most minimal empirical ID. I am updating the proposal to reflect this. —Michael Z. 17:54, 2 January 2021 (UTC)
Discussion edit
- Support and I would prefer the form with "\" insteand of "%5C". --Epìdosis 11:46, 2 January 2021 (UTC)
- The backslash “\” can have a special meaning in text strings, so I believe it must be sent in percent-encoded form “%5C”. If our URL-forming mechanism can’t do that on the fly, then we probably need to store it in this form. It looks ugly, but it is straightforward to copy it from URLs in the reference source. —Michael Z. 15:02, 2 January 2021 (UTC)
- @Mzajac: In fact Wikidata does URL-encode all id's when appending to formatter URL's, so you should use the '\' format (otherwise these will be double-encoded which will probably not work). ArthurPSmith (talk) 16:03, 11 January 2021 (UTC)
- Thank you! I have updated the proposal. —Michael Z. 17:16, 11 January 2021 (UTC)
- @Mzajac: In fact Wikidata does URL-encode all id's when appending to formatter URL's, so you should use the '\' format (otherwise these will be double-encoded which will probably not work). ArthurPSmith (talk) 16:03, 11 January 2021 (UTC)
- The backslash “\” can have a special meaning in text strings, so I believe it must be sent in percent-encoded form “%5C”. If our URL-forming mechanism can’t do that on the fly, then we probably need to store it in this form. It looks ugly, but it is straightforward to copy it from URLs in the reference source. —Michael Z. 15:02, 2 January 2021 (UTC)
- Support--Adam Harangozó (talk) 13:46, 2 January 2021 (UTC)
- Support --Gerwoman (talk) 15:19, 2 January 2021 (UTC)