Wikidata:Requests for permissions/Bot/Addbot 4
The following discussion is closed. Please do not modify it. Subsequent comments should be made in a new section. A summary of the conclusions reached follows.
- Approved Vogone talk 15:44, 31 July 2013 (UTC)[reply]
Addbot 4 edit
Addbot (talk • contribs • new items • new lexemes • SUL • Block log • User rights log • User rights • xtools)
Operator: Addshore (talk • contribs • logs)
Task/s: Importing Geo Coordinates
Function details: The bot will scan wikipedia and find coords that are not added to wikidata.
- Will only add a coord if the GND is a geographical place
- Will only add a coord if there is no existing coord
- Will add a reference for the coord added (linking to the wikipedia item)
--·addshore· talk to me! 14:08, 31 July 2013 (UTC)[reply]
- Please see the test edits below. As you cant currently see geodata in diffs this seems like the easiest way to link to them.
[1] [2] [3] [4] [5] [6] [7] [8] [9] [10] [11] [12] [13] [14] [15] [16] [17] [18] [19] [20] [21] [22] [23] [24] [25] [26] [27] [28] ·addshore· talk to me! 14:49, 31 July 2013 (UTC)[reply]
- Test edits look good. I assume your bot could also import geodata from other Wikipedias (like e. g. dewiki)? That would be great. Vogone talk 15:25, 31 July 2013 (UTC)[reply]
- Yup, as long as the conditions listed above are met! ·addshore· talk to me! 15:26, 31 July 2013 (UTC)[reply]
Support seems no problem --ReviDiscussSUL Info 15:42, 31 July 2013 (UTC)[reply]
Sorry, that I am late with this comment, but the test edits do not look good. First, as you cannot see the details of coordinate values in the normal user interface, a better way to link to the test edits would be with API links like http://www.wikidata.org/w/api.php?action=wbgetclaims&format=xml&entity=Q300856&property=P625 Here you will see that the precision of the value is not indicated. Please add precision according to how the value is stated in the source. Byrial (talk) 16:20, 31 July 2013 (UTC)[reply]
- this looks better? :) http://www.wikidata.org/w/api.php?action=wbgetclaims&format=xml&entity=q1003873&property=P625 ·addshore· talk to me! 17:58, 31 July 2013 (UTC)[reply]
- No. Now there is a precision value, but it is wrong. The source (en:Hawthorne, Florida) gives the coordinates with 1 arch second precision, that is 0.00027777... degrees. But the specified precision at Wikidata is 0,0001 degree. Byrial (talk) 19:28, 31 July 2013 (UTC)[reply]
- I am mildly confused (although it may just be because I am tired). I get the data latitude="29.5881" longitude="-82.0839". How can this precision become 0.00027777? Apparently my method is flawed! ·addshore· talk to me! 19:37, 31 July 2013 (UTC)[reply]
- I have reverted the above changed, I changed the way the precision was calculated and copied exactly what is done in pywikibot which resulted in these two http://www.wikidata.org/w/api.php?action=wbgetclaims&format=xml&entity=q1180946&property=P625 http://www.wikidata.org/w/api.php?action=wbgetclaims&format=xml&entity=q1180946&property=P625 which I am also presuming to be wrong...
- No. Now there is a precision value, but it is wrong. The source (en:Hawthorne, Florida) gives the coordinates with 1 arch second precision, that is 0.00027777... degrees. But the specified precision at Wikidata is 0,0001 degree. Byrial (talk) 19:28, 31 July 2013 (UTC)[reply]
- pywikibot:
self._precision = math.degrees(self._dim / (radius * math.cos(math.radians(self.lat))))
- php:
$precision = rad2deg( $coord['dim'] / ( 6378137 * cos( deg2rad( $coord['lat'] ) ) ) );
- How are you calculating the precision? ·addshore· talk to me! 20:23, 31 July 2013 (UTC)[reply]
- You should not round the decimal values to only 4 decimals. The latitude is 29°35′17″N, that is 29 + (35 + 17/60)/60 = 29.58805556 degrees with the precision 0.00027778. The longitude is 82°5′2″W = 82 + (5 + 2/60)/60 = 82.08388889 degrees with the precision 0.00027778. The precision is 0.00027778 degree because that is the decimal value of 1 arch second (1/60/60 = 0.00027778) rounded to 5 significant digits. Byrial (talk) 20:40, 31 July 2013 (UTC)[reply]
- Yes, the new precison of "0.118055714087" is much more wrong. It the precision of decimal degree values, and you need no cosinus (is that the English word?) or radians for that. Byrial (talk) 20:46, 31 July 2013 (UTC)[reply]
- But this is not how the data is stored / how it is retrieved. (see the sample data below). All of the lats and longs are in decimal form (not all to 4dp though).
- How are you calculating the precision? ·addshore· talk to me! 20:23, 31 July 2013 (UTC)[reply]
Extended content |
---|
+-------+------------+----------+------------+---------+----------+---------+-----------+----------------------------------+------------+-----------+ | gt_id | gt_page_id | gt_globe | gt_primary | gt_lat | gt_lon | gt_dim | gt_type | gt_name | gt_country | gt_region | +-------+------------+----------+------------+---------+----------+---------+-----------+----------------------------------+------------+-----------+ | 3 | 895125 | earth | 1 | 54.3616 | -0.6697 | 1000 | landmark | | GB | NULL | | 6 | 2386615 | earth | 1 | 12.9758 | 80.2205 | 10000 | city | | IN | TN | | 18 | 624370 | earth | 1 | 52.65 | 17.95 | 1000 | NULL | | NULL | NULL | | 19 | 624370 | earth | 0 | 52.65 | 17.95 | 10000 | city | | PL | NULL | | 20 | 15498044 | earth | 1 | 54.6079 | -3.1467 | 100 | edu | | GB | NULL | | 25 | 3340862 | earth | 1 | 39.275 | -77.0032 | 1000 | landmark | | NULL | NULL | | 37 | 1767975 | earth | 1 | 51.102 | -114.085 | 1000 | NULL | | NULL | NULL | | 43 | 1767981 | earth | 1 | 51 | -113.967 | 1000 | NULL | | NULL | NULL | | 80 | 328162 | earth | 1 | 43.8 | 143.9 | 10000 | city | | JP | NULL | | 89 | 20941111 | earth | 0 | 90 | 12 | 10000 | waterbody | Arctic Ocean | NULL | NULL | | 98 | 20941111 | earth | 0 | 63.25 | 12 | 1000000 | country | Norway | NULL | NULL | | 99 | 20941111 | earth | 0 | 59.9 | 12 | 1000000 | country | Sweden | NULL | NULL | | 100 | 20941111 | earth | 0 | 57.35 | 12 | 10000 | waterbody | Kattegat | NULL | NULL | |
- Huh? You state "Imported from English wikipedia" as source. Where in the English Wikipedia did you find that table? In those enwiki pages I looked at to verify the edits, all coordinates was given as degree/arch minutes/arch seconds values with a precision of 1 arch second = 0.00027778 degrees. But the precision may of course be different for different coordinate values. Byrial (talk) 12:11, 1 August 2013 (UTC)[reply]
- Those are the values stored in the enwiki database, they are also the values you get when you query the api. I'm slightly confused as what to do now. ·addshore· talk to me! 13:45, 1 August 2013 (UTC)[reply]
- I don't know what enwiki database you are talking about. But I just tried the api.php?action=query&prop=coordinates API call, and compared the results with the wiki texts, and found that the API gives rounded values for latitude and longitude, and that it is impossible to find the original precision from the API values. Based on that I will recommend that you only use the actual wikipages as sources, not the API, nor the to me unknown database you mentioned. Byrial (talk) 14:34, 1 August 2013 (UTC)[reply]
- The database is one of the replicas of the enwiki databases stored on labs (would also be in the dumps). I guess I will have to use the database to easily find the articles to check and then scrape the wikitext template for the coords to then enter to wikidata. Thanks for the help! Seems a bit weird that the backend db stores them in this fashion but oh well!. I will try and write the change at some point tonight and will test it against the values mentioned above. ·addshore· talk to me! 14:38, 1 August 2013 (UTC)[reply]
- This wikitext scraping is just going to get silly.... There are 17 redirects to the Coord template on enwiki alone, and who knows how many other templates >.< (actually about 600,000 pages which don't use the template). Any suggestions? ·addshore· talk to me! 14:43, 1 August 2013 (UTC)[reply]
- The database is one of the replicas of the enwiki databases stored on labs (would also be in the dumps). I guess I will have to use the database to easily find the articles to check and then scrape the wikitext template for the coords to then enter to wikidata. Thanks for the help! Seems a bit weird that the backend db stores them in this fashion but oh well!. I will try and write the change at some point tonight and will test it against the values mentioned above. ·addshore· talk to me! 14:38, 1 August 2013 (UTC)[reply]
- I don't know what enwiki database you are talking about. But I just tried the api.php?action=query&prop=coordinates API call, and compared the results with the wiki texts, and found that the API gives rounded values for latitude and longitude, and that it is impossible to find the original precision from the API values. Based on that I will recommend that you only use the actual wikipages as sources, not the API, nor the to me unknown database you mentioned. Byrial (talk) 14:34, 1 August 2013 (UTC)[reply]
- Those are the values stored in the enwiki database, they are also the values you get when you query the api. I'm slightly confused as what to do now. ·addshore· talk to me! 13:45, 1 August 2013 (UTC)[reply]
- Huh? You state "Imported from English wikipedia" as source. Where in the English Wikipedia did you find that table? In those enwiki pages I looked at to verify the edits, all coordinates was given as degree/arch minutes/arch seconds values with a precision of 1 arch second = 0.00027778 degrees. But the precision may of course be different for different coordinate values. Byrial (talk) 12:11, 1 August 2013 (UTC)[reply]
- I would consider disregarding the templates and request the wikitext with all templates expanded ("api.php?action=query&prop=revisions&rvprop=content&rvexpandtemplates=", and then look for link text like //tools.wmflabs.org/geohack/geohack.php?pagename=Aalborg_Municipality¶ms=57_2_47_N_9_55_9_E_type:landmark_region:DK Byrial (talk) 16:44, 1 August 2013 (UTC)[reply]
- That is indeed a great idea :), Parse the geohack urls and work from there! ·addshore· talk to me! 16:52, 1 August 2013 (UTC)[reply]
- Well I added the parsing of urls from text and stole a php class from geohack adding in the precision calculations. here. Will run a few small tests shortly! ·addshore· talk to me! 11:09, 2 August 2013 (UTC)[reply]
- That is indeed a great idea :), Parse the geohack urls and work from there! ·addshore· talk to me! 16:52, 1 August 2013 (UTC)[reply]
[31], [32]. Currently the function for determining the precision can be seen here. Would it be better to use this (ie. 1/60)etc or to specify decimal values? ·addshore· talk to me! 11:40, 2 August 2013 (UTC)[reply]
- The precision is as at enwiki for the 4 examples, but it is IMHO silly to indicate the position of states with a precision of 1 arch second as enwiki does in example 3 and 4 (Bavaria and Brandenburg). The longitude for Brandenburg is wrong: 13°0′29″E is 13.0080556 as decimal value. Remember to look for arch seconds even if the arch minute value is 0. Do you have any examples where the Wikipedia gives the coordinates as decimal degree values? I suspect that you will get the precision wrong in such cases. Byrial (talk) 14:32, 2 August 2013 (UTC)[reply]
- Hmm. I have changed from empty() to isset() as empty detects 0 as empty (this should fix the incorrect long value). I Have also added testcases for 0 values here. are there such cases where the geohack urls would have something such as '29.583333333333___N_82.083333333333___E' in them?? Indeed currently for such values the precision is incorrect. ·addshore· talk to me! 14:57, 2 August 2013 (UTC)[reply]
- Okay! I have added the ability to parse reparse decimals and also then give the correct precision! see here. Is there anything else you can spot? (I will run some more test edits shorty) ·addshore· talk to me! 15:16, 2 August 2013 (UTC)[reply]
- Yep, after further texting all looks good!
- <value latitude="37.383333333333" longitude="-5.9833333333333" altitude="" precision="0.016666666666667" globe="http://www.wikidata.org/entity/Q2"/>
- <value latitude="48.7775" longitude="11.431111111111" altitude="" precision="0.00027777777777778" globe="http://www.wikidata.org/entity/Q2"/>
- <value latitude="52.361944444444" longitude="13.008055555556" altitude="" precision="0.00027777777777778" globe="http://www.wikidata.org/entity/Q2"/>
- These are all correct. I will not start the script and will continue to watch it from a while :) ·addshore· talk to me! 11:25, 3 August 2013 (UTC)[reply]
Ammended Function details:The bot will scan wikipedia and find coords that are not added to wikidata.
- Will only add a coord if the GND is a geographical place
- Will only add a coord if there is no existing coord
- Will add a reference for the coord added (linking to the wikipedia item)
- Will only add a coord if the precision is better than 1 degree
- Will always add coords with a precision
- Will only extract a coord from a page if there is only 1 coord on the page..
·addshore· talk to me! 13:49, 3 August 2013 (UTC)[reply]