Wikidata:Requests for comment/Administative divisions and populated places
An editor has requested the community to provide input on "Administative divisions and populated places" via the Requests for comment (RFC) process. This is the discussion page regarding the issue.
If you have an opinion regarding this issue, feel free to comment below. Thank you! |
THIS RFC IS CLOSED. Please do NOT vote nor add comments.
- The following discussion is closed. Please do not modify it. Subsequent comments should be made in a new section. A summary of the conclusions reached follows.
- Stale, not enough input for consensus. John F. Lewis (talk) 15:11, 21 July 2015 (UTC)[reply]
As already said in Wikidata:Project chat#Population problem is that Wikipedia in almost all cases merge populated place and administrative division. For example Assisi (Q20103) is both an administrative division commune of Italy (Q747074) and a populated place (the city itself), but administrative division has got 27377 people because include many frazione (Q1134686) , populated place only 3732 people, the same problem is for altitude ecc., in the future we will have same problems with boundaries coordinates. This problem affects millions of items on data and is also a problem for the usability of the data, so it is necessary to establish a guideline that applies to all items with strong consensus (feel free to correct or modify) --Rippitippi (talk) 21:12, 10 February 2014 (UTC)[reply]
My proposals are:
Same items ppl (populated place) + administrative division with different proprieties to maintain adherence with the majority of the Wikipedia entries
editWhere a WP page discusses both then it can be considered to be about the larger unit and the smaller unit is discussed as part of that. Statements (boundaries, population etc. ) should only be about the larger unit. Filceolaire (talk) 21:35, 22 February 2014 (UTC)[reply]
Different items for ppl and admin division but there will be confusion with links to Wikipedia someone will point to admin div someone to ppl
editAgree, first split items into separate parts (e.g. municipality and village with the same name) before adding population figures, even if there is no wikipedia article for the village. Also other properties may be different between municipality and same-named village and is also solved by this approach. Michiel1972 (talk) 13:53, 11 February 2014 (UTC)[reply]
- If we can make different statements about the various components then we need different items to host those statements. Sorting the sitelinks will have to follow.
Comments
editTake a look at Domback (Q1879056). It's not an item about an adm unit and a ppl with the same name, but rather about a ppl divided between two statistical entities. They have the same name, but are separated by the statistical office described as "North" and "South". I think this also can be used for such cases as Assisi. Instead of more properties and qualifiers to separate the statements, we keep one item @ Wikidata per item @ Statistics Whatever. And it does not make any difference if there is two, three or 12 entities in the same article, or what kind of entities they are. The drawback is that Wikidata maybe not will solve the Interwiki if one project have one article, while others have two. But such problems already exists, with Cain and Abel, Bonnie and Clyde etc etc... -- Lavallen (talk) 06:50, 11 February 2014 (UTC)[reply]
- So for a milion ~ of item whe must mantain 3 items? it's very difficult to mantain database integrity --Rippitippi (talk) 13:59, 11 February 2014 (UTC)[reply]
- Maybe even more than 3 sometimes....
- What is most easy to maintain: Items with several corresponding items in the databases of the Statitical organisations or when there is a one to one-relation?
- We have thousands of examples where nlwp and svwp have different opinions about if two entities should be in one article or in two. Often nlwp choose to have more articles than svwp. Svwp then need to link to items similair to Q1879056, so that we do not need to duplicate the information in two items. And the most important information in items like Q1879056 is has part(s) (P527). They often do not need to be maintained. When it once is installed, it can often stay as it is. -- Lavallen (talk) 14:31, 11 February 2014 (UTC)[reply]
- I agree with Lavallen. The most practical solution is to have one item in Wikidata per item define by the relevant Statistics bureau. This way, there would be no need for a qualifier to define the perimeter to which each demographic figures apply and the link with the source will be easier to maintain. --Casper Tinan (talk) 22:09, 13 February 2014 (UTC)[reply]
- If we have statistics about 4 different census designated places then we cannot include this data unless each of these places has it's own item. Item splits should follow and match the statements. Filceolaire (talk) 21:35, 22 February 2014 (UTC)[reply]
- I agree with Lavallen. The most practical solution is to have one item in Wikidata per item define by the relevant Statistics bureau. This way, there would be no need for a qualifier to define the perimeter to which each demographic figures apply and the link with the source will be easier to maintain. --Casper Tinan (talk) 22:09, 13 February 2014 (UTC)[reply]
- +1 I also support distinct items. See Langenzenn (Q15724921) (village/ppl) and Langenzenn (Q2230) (municipality). The latter consists of 23 villages, Langenzenn (Q15724921) being one of them. This is imho a quite clean model. — Felix Reimann (talk) 16:27, 13 April 2014 (UTC)[reply]
I've looked at several open data sets related to administrative hierarchies last year from the point of view of building a reverse geocoder that can come up with the correct colloquial names for neighborhoods and areas. This requires both good data on boundaries and names of places. This is an extremely hard problem and the open data for this is highly fragmented across many datasets that add to the problem with problematic categorizations, a bias towards formal administrative names rather than the informal names that people actually use in real life, a lack of good data on boundaries, completeness and correctness of the data, and data licensing. E.g. open street maps is surprisingly useless for this problem since it only includes a few tens of thousands of boundaries for administrative hierarchies world wide. Foursquare made a rather heroic effort to unite several datasets worldwide including geonames, geoplanet and countless open datasets for specific regions. The result is published at quatroshapes.org. I believe wiki data can contribute in several ways. 1) where possible link to other datasets. A good start would be importing geoplanet woeids and geoname ids on dataitems where possible. I estimate that approximately 500000 wikipedia articles are linked directly from the geonames dataset already and there exist mappings to geoplanet as well. 2) adopt a categorization similar to geoplanet to distinguish e.g. places that have historical meaning (e.g. weimar republic) from formal names such as Bezirk Pankow, which contains several areas with less formal names. Geoplanet data has been archived at the internet archive and is available under cc 3.0 with attribution, if I remember correctly. 3) consider supporting geojson for describing boundaries and other geographic features. Coordinates only have limited use (positioning a map) and good polygon data would be very valuable. 4) identify problems in the data and seek community support for filling the gaps. For example good open neighborhood data is available for large parts of northern America but very hard to come by for e.g. Germany. There are currently no good datasets with global coverage. Jillesvangurp (talk) 08:08, 28 August 2014 (UTC)[reply]
I think we should use different models for each country, because there are countries where administrative divisions and populated places are the same and there are others where national law requires a number of inhabitants for being a municipality. For instance, in Spain we have got municipalities with three inhabitants without any problem (Illan de Vacas (Q1381577)), but in Denmark municipalities must have got a minimum population of 5,000 inhabitants (except for islands). We should use the same common sense which is being used in Wikipedia for municipalities of different countries.88.14.100.137 16:21, 6 February 2015 (UTC)[reply]