Wikidata:Requests for comment/Sitelinks with fragments
- The following discussion is closed. Please do not modify it. Subsequent comments should be made in a new section. A summary of the conclusions reached follows.
- Closing this RfC as it seems to be a majority consensus against the solution. Several advocates a solution based on redirects, but this has some serious drawbacks as described in the discussion.Jeblad (talk) 16:18, 22 May 2015 (UTC)[reply]
An editor has requested the community to provide input on "Sitelinks with fragments" via the Requests for comment (RFC) process. This is the discussion page regarding the issue.
If you have an opinion regarding this issue, feel free to comment below. Thank you! |
THIS RFC IS CLOSED. Please do NOT vote nor add comments.
Contents
This is a rough summary of the Jeblad's original proposal below.
- Allow a fragment identifier to be added to sitelinks. Example: Bonnie Parker (Q2319886): enwiki → Bonnie and Clyde#Bonnie Parker
- Existence of the corresponding anchor (section) on the target page may not be required, but a warning should be issued if it does not exist.
- Allow a single page to be sitelinked from multiple items, iff the sitelink fragment identifiers differ, for example:
A similar proposal was discussed briefly in RfC/One vs. several sitelink-item correspondence#Allowing use of anchored interwiki sitelinks.
Petr Matas 14:09, 14 December 2014 (UTC)[reply]
Original proposal
editHow can we make sitelinks behave more like expected on Wikipedia and other sites? The idea behind the current behavior at Wikidata is to unwind all redirects to get to the correct item, that is the final page pointed to by any previous redirects. This is to verify that we do in fact refer to a single item. The reasoning behind this is to keep things simple, we should only refer to one item on one page. This although creates some structural problems on the Wikipedia-projects where contents about several items can be collected on a single page. There is also nothing in the linked data model itself that says we can not have several items at a single page, we have just chosen to do it like that.
In a linked web a page with several items can be expressed by having fragment identifiers, and this is also described as one of the main methods to do linkage in a semantic web context. More specifically this is described as a "Hash URI" in contrast to what we do, using 303s to redirect to different pages.[1] It is important to note that we do not link to a page with semantic data when we use a sitelink, we point to content page. (Seems like we have lost the XSD for our sitelinks.)
In the usual semantic web it is not necessary to have the hash URI (fragment) point to a specific section, it is left for the user to infer how to use it, but in the Wikidata-context it could be necessary to verify that the hash URI (fragment) is an actual subsection, but note that this has its own problems. If there is a fragment on the URI it is although a new identifier anyhow and should be treated as such.
The correct behavior would then be to unwind the URI but to keep any hash (fragment) found during the process and add that back to the URI for the final page. If there are two URIs to the same page, but with different hash (fragment) then the two URIs are different identifiers. For the purpose of identifying the internal item they are the same, but for identifying an external entity they are not the same.
To make an example: Bonnie_and_Clyde#Bonnie
is not the same as Bonnie_and_Clyde#Clyde
for the purpose of referring to the external entity, but both refer to the common internal concept Bonnie_and_Clyde
. Note that in this example there is internal items for Bonnie and Clyde (Q219937) and for each of the criminals Bonnie Parker (Q2319886) and Clyde Barrow (Q3320282).
It should be possible to add a fragment to the URI given for the sitelinks and it should be kept, and it should not be an error to add a fragment that does not exist in the final page, but it could trigger a warning to the user. By not enforcing that a section with the name of the hash (fragment) exist in the document the behavior is more in line with the expected behavior and it also allows use of hash URIs to list-like pages. It will also make the processing more efficient.
Accepting hash URIs will have some consequences. For example the EntityData must strip them off before handling the item, but must keep the fragment during 303 and content negotiation. Other special pages that does redirects should reapply the fragments in the redirect. Lua should be able to use hash URIs for referring to the item, but it should be silently stripped off and the data returned should be the item itself. Later on there could be several items on a page and then the interpretation of the resulting structure is left to the Lua scripts, that is the user.
Two or more external pages on the same site might be referred to by the same item if all of them uses a hash URI (fragment) less one, but if any two or more use the same fragment (or none) the linkage should be rejected.
A single page can be referred to by two or more sitelinks on different items if they all have different hash URIs (fragment) less one, but if any two or more use the same fragment (or none) the linkage should be rejected.
A single page can not be referred to by more than one sitelink if the only differ in the hash URI (fragment).
There is a simplified version that is very tempting but has some unfortunate side effects. If the title string of the page holding the redirect is the same as the fragement string, then the unwinding of the redirects can halt and the URI of the page be returned. This will then be the canonical URI used as sitelink. The unfortunate side effect is that the real external page will not reach the correct page on Wikidata through its implicit backlink.
– The preceding unsigned comment was added by Jeblad (talk • contribs).
Main arguments in favour of sitelinks with fragments are:
- The new model will solve a lot of the remaining conflicts on Wikipedia
- It is a standard way to handle such problems as described by W3.[1]
Main arguments against sitelinks with fragments are:
- The current model is simple and pretty obvious, why change it.
- Erroneous linkage through the hash URI (fragment) will not be so visible, some maintenance tool is needed.
- It is not obvious how it will impact reuse of data from items, especially through Lua.
– The preceding unsigned comment was added by Jeblad (talk • contribs).
- After this change the current method of articles reusing data from their unique linked wikidata item won't work. All data will have to be fetched using Lua. – The preceding unsigned comment was added by Filceolaire (talk • contribs).
- This is only partially correct. It will still be possible to use the unique ids, but it will be possible to use other linked ids too. The default will be for the item identified by the sitelink without a hash identifier, which will be the complete article. Jeblad (talk) 19:30, 13 November 2014 (UTC)[reply]
- After this change the current method of articles reusing data from their unique linked wikidata item won't work. All data will have to be fetched using Lua. – The preceding unsigned comment was added by Filceolaire (talk • contribs).
This will maintain unique URIs, which is important, but they will then point into different parts of page on an external site. The external page will observe multiple items on Wikidata, which is a problem. An item on Wikidata will only be able to hold one URI to the external site, which is important. There can be several items returned by Special:ItemByTitle, it redirects to a single item now, but only one by Special:GoToLinkedPage which is similar to the situation now. That has consequences for wbgetitems, but I'm not sure that functionality is in use, and also for wblinktitles which is more serious as it is used by edit links (or at least it was supposed to be used by that gadget). Locally on the external page Lua will see multiple items, but if the hash-less URI is listed first it will probably be what is expected. Client-wise storage of item id is a problem, it is only a single item now. Jeblad (talk) 10:36, 11 November 2014 (UTC)[reply]
- 1. Having three items linked to one wikipedia article is confusing. The current system is that an article can fetch data from the wikidata item which has a sitelink to that article. If an article has site links from three wikidata items (From "Bonnie and Clyde" to the page and from "Bonnie Parker" and "Clyde Barrow" to #sections of the page) then which of these WD items should the article use to fill it's infoboxes? Filceolaire (talk) 20:23, 12 November 2014 (UTC)[reply]
- On 1, the default and correct item is hash-free sitelink for "Bonnie and Clyde". The items linked with hash URIs will be secondary and can be explicit queried with a more lightweight method. It is also worth noting that the linkable sections are not essential here, the important thing to note is the hash URI as an identifier. The group "Bonnie and Clyde" does in fact consist of two entities "Bonnie Parker" and "Clyde Barrow", and those two subentities should be identifiable. (Note subentities and not sections.) Jeblad (talk) 20:05, 13 November 2014 (UTC)[reply]
- In many cases it may also be appropriate to just fetch all claims from all linked items. Petr Matas 00:26, 14 December 2014 (UTC)[reply]
- 2. Links to #sections don't help the "hatmaker" "hatmaking" problem. Where some wikipedias have an article on "Hatmakers" and other wikipedias have an article on "Hatmaking" the links between these can easily be resolved with sitelinks to redirects but links to #sections are not as easy and don't give any additional fuctionality. Filceolaire (talk) 20:23, 12 November 2014 (UTC)[reply]
- On 2, merging of "hatmakers" (list of people making hats) and "hatmaking" (making of hats) would be an error, those are different concepts. If we assume that those two concepts are in fact the same, which they are not, then the problem would be a missing merge of the articles on Wikidata, not two different articles on Wikipedia. Actually "hatmakers" would use hash URIs for each of the hatmakers, and it would be a valid solution to the list problem. Also, the hatmakers on Wikidata could be stored on a single page and this has been discussed. An item reside on a page, but it does not use the page id as its own identifier just because of this. Jeblad (talk) 20:05, 13 November 2014 (UTC)[reply]
- I think that neither sitelinks with fragments nor sitelinks to redirects are a good solution to this problem. In my opinion, this is better solved by a property, which links similar items to each other. Nevertheless, sitelinks with fragments are still useful in other scenarios. Petr Matas 00:26, 14 December 2014 (UTC)[reply]
- On 2, merging of "hatmakers" (list of people making hats) and "hatmaking" (making of hats) would be an error, those are different concepts. If we assume that those two concepts are in fact the same, which they are not, then the problem would be a missing merge of the articles on Wikidata, not two different articles on Wikipedia. Actually "hatmakers" would use hash URIs for each of the hatmakers, and it would be a valid solution to the list problem. Also, the hatmakers on Wikidata could be stored on a single page and this has been discussed. An item reside on a page, but it does not use the page id as its own identifier just because of this. Jeblad (talk) 20:05, 13 November 2014 (UTC)[reply]
- 3. Sitelinks to redirect pages can do the same job. This change will enable the "Bonnie Parker" item to have sitelinks to specific sections of the "Bonnie and Clyde"articles. As far as I can see this is nearly the same functionality we would get by sitelinking to a "Bonnie Parker redirect page" linking to that "Bonnie and Clyde" article. This linking to #sections does not help the "Bonnie and Clyde" article link to wikipedias which only have articles for "Bonnie Parker" and "Clyde Barrow" . Filceolaire (talk) 20:23, 12 November 2014 (UTC)[reply]
- On 3, sitelinks to redirect pages is nearly the same, but it will not directly point to the correct article on Wikipedia and it will create maintenance problems because the item on Wikidata will not know about the final page on Wikipedia. That means unwinding all the links each time something is moved. It also means that data from the redirect must be queried by a heavier caching mechanism using the title of the redirect, that is the same mechanism as is planned for lookup of external items.
- When the present system was made I concluded that marking some redirects as "important", and that those should not be unwinded (sp?), was not a good idea. A very important thing here is that by using redirects we will create, and use, two different URIs that both identify the same concept. It is important here that the page describing the entity (the rdfs:seeAlso) is not the redirect, it is the actual page (the article) on Wikipedia. I think that is still the case, but perhaps I'm wrong. Jeblad (talk) 20:05, 13 November 2014 (UTC)[reply]
- Yes, Wikidata should make sure that the same concept on a single site is not linked from multiple items, otherwise we will end up in a big mess. This is done easily with fragments, because the information, which distinguishes between the concepts, is stored directly in the sitelink, i.e. in Wikidata. It will be difficult to maintain this if sitelinks to redirects are common. Note that targets of redirects can be changed and it would be difficult for Wikidata to keep track of that. Petr Matas 00:26, 14 December 2014 (UTC)[reply]
- 4. This proposal could help to link "Bonnie and Clyde" to "Bonnie Parker. The wikipedia articles will have links from three different wikidata items. they could therefore have three sets of sitelinks on the left hand column. First a list of links taken from the "Bonnie and Clyde" wikidata item then a list of the links on the "Bonnie Parker" item then a list of links from the "Clyde Barrow" item. Unfortunately the links to the "Bonnie and Clyde" articles will appear in all three of these lists so there needs to be a way to resolve that.
- On the whole I am inclined to Oppose this as the benefits seem marginal compared to linking to redirects. Filceolaire (talk) 20:23, 12 November 2014 (UTC)[reply]
- On 4, there will be several links if there are several identifiable entities described in the same article. This is somewhat similar to the present situation when local sitelinks exist.
- In all I can't see that any of the arguments against the solution are valid. Jeblad (talk) 20:05, 13 November 2014 (UTC)[reply]
- I think that only one interwiki link per language should be presented - the most relevant one. The items connected to the page by the links with fragments should be used only if the given language is not found in the primary item (the one connected by the link without fragment). I see the purpose of interwiki links as "get me to anything related to start from in the given language," so it is up to internal links at the target site to link to the related concepts. Petr Matas 00:26, 14 December 2014 (UTC)[reply]
Weak support, but I think that the use of a property, which links similar items to each other, should be implemented first. After that we should see if the need for sitelinks with fragments persists. Petr Matas 00:26, 14 December 2014 (UTC)[reply]
Oppose. Don't link to page sections; link to redirects. For example, instead of linking Bonnie Parker (Q2319886) to en:Bonnie and Clyde#Bonnie Parker, link to en:Bonnie Parker. Andy Mabbett (Pigsonthewing); Talk to Andy; Andy's edits 16:57, 22 December 2014 (UTC)[reply]
- See the reactions to Filceolaire's third comment above for explanations why sitelinks to redirects are not a good option. Petr Matas 17:30, 22 December 2014 (UTC)[reply]
Possible alternative: The proposed Fragment identifier property. Petr Matas 19:23, 22 December 2014 (UTC)[reply]
Support: Wikidata is ... collaborative, multilingual, secondary database, collecting structured data to provide support for Wikipedia, Wikimedia Commons, the other Wikimedia projects, and well beyond that. Wikidata should serve the relations to fragments. Wikidata should not say this relations have to be managed outside in several projects and language versions by redirects or atomised articles. Articles containing some fragments with clear relations to items, it is confusing to have no sitelinks to them. --Diwas (talk) 20:21, 28 December 2014 (UTC)[reply]
Strong oppose: The proposal requires that subsection names on an article either become unmodifiable, or be tracked, because they provide the identity that we link to. But subsection names can be composed nontrivially, e.g. come from an transcluded page, or come from Lua, or from a parametrized template. Some pages, like the this, don't provide the subsections that we are looking for, so we wouldn't actually solve the problem anyway. We still need to be able to link the Bonny and Clyde page, irrelevant of having subsections being linked to. In short, it does not really solve the problem, but it certainly makes the whole situation far more complicated than the "one entity per article" rule that we currently have. --Denny (talk) 23:57, 5 January 2015 (UTC)[reply]
- I think that this is not true. The existence of the corresponding section is not strictly required, because the primary purpose of sitelinks with fragments is to allow multiple items to link to a single page in a controlled way. We should read the fragment identifier
#Bonnie
as "the passage about Bonnie". If a corresponding anchor exists, we get a bonus of the browser scrolling to it automatically. So, renaming a section is not really a problem. Petr Matas 00:20, 6 January 2015 (UTC)[reply]- In that case I would be even stronger opposed. What's the point of having an arbitrary string as a fragment identifier on the sitelink? Why have it then at all? Just leave the sitelink out and make an explicit connection to the item of the page you would put the fragment identifier - that sounds much cleaner to me. --Denny (talk) 23:21, 6 January 2015 (UTC)[reply]
- A completely arbitrary string is useless, of course. The point of a meaningful fragment identifier is
- to make the browser open the corresponding section if it exists,
- to decrease the probability of the same concept in one page being linked from multiple items (the same page can be linked from multiple items, but the fragment identifiers must differ).
- But I agree that linking to another item, for example using a property (please vote!), is cleaner. Petr Matas 12:47, 7 January 2015 (UTC)[reply]
- Note that the sitelinks as they are now does not track changes to the article title, neither are they unmodifiable. There are although bots trying to track such changes. It is not really necessary to track the fragment identificators, they act as ids and just happens to be similar to section headers. A page with a given fragment should provide information about the entity identified by the fragment, but it is not necessary to associate the fragment with any specific section.
- Lately I have been going through several items and I start to doubt the idea that there are as much 1-to-1 relations among the sitelinks as previously stated. Very often the relations and thus the similarities are quite superficial, it can be municipalities, cities and islands that just happens to be at the same place, or it can be biograpies that is more about some office held by the person than the person itself. This makes me question whether sitelinks is usable for anything except to make a statement about superficial similarity at a surface level. 109.247.163.112 12:13, 26 January 2015 (UTC)[reply]
- I think that your claim that "the sitelinks as they are now does not track changes to the article title" is wrong. According to my experience, if you rename a Wikipedia article, the sitelink is updated immediately. Concerning your approach to fragment identificators, I agree. Concerning the superficial connections between articles and items, I thing that they come from the fact that we want an article to have as many interwikis as possible, even though the topics are slightly different. Fixing this requires the item to be split into multiple items linked to each other, so that all the interwikis are still available. Petr Matas 13:33, 25 April 2015 (UTC)[reply]
- The discussion about arbitrary identifiers are interesting. The URI as such when used as an identifier is the complete string with the fragment identifier. URIs that carry information is sometimes described as smart or intelligent identifiers. Much of the linked/semantic data world have split in two groups; one where such smart identifiers are considered bad, and one where they are considered good. Somehow it splits along the lines of linked data (uses URL) and semantic data (uses URN). Both URLs and URNs can be described as URIs, or actually as IRIs but lets not be nitpicking. Identifiers in Wikidata are oblique URIs as seen from the outside, they do not carry any information except being identifiers. Links on Wikipedia do carry information, but you must know how to interpret that information (that is know the language and its grammer). Even if a link do carry additional information, we can very often refuse to use that additional information. We simply let the URI become an oblique identifier for our own internal use. I believe we should handle all links as oblique, even links from Wikipedia, and even fragment identifiers on those links. Jeblad (talk) 15:46, 22 May 2015 (UTC)[reply]
- A completely arbitrary string is useless, of course. The point of a meaningful fragment identifier is
- In that case I would be even stronger opposed. What's the point of having an arbitrary string as a fragment identifier on the sitelink? Why have it then at all? Just leave the sitelink out and make an explicit connection to the item of the page you would put the fragment identifier - that sounds much cleaner to me. --Denny (talk) 23:21, 6 January 2015 (UTC)[reply]
Oppose The better way to go is sitelinks to redirects. Then there will be a page on the wiki (even if only a redirect page) that exclusively points 1:1 to the Wikidata item. And redirects are more stable, and make what's going on more transparent; whereas people change the headers that fragments would rely on all the time. Jheald (talk) 19:02, 24 January 2015 (UTC)[reply]
Oppose ·addshore· talk to me! 16:35, 17 February 2015 (UTC)[reply]
Oppose Links to redirects are clearer and more stable. --Infovarius (talk) 13:14, 6 March 2015 (UTC)[reply]
- Are there consensuses in all projects to create redirects for all concepts (managed in a wider page)? --Diwas (talk) 22:42, 6 March 2015 (UTC)[reply]
Weak oppose, As noted above, these anchors are as modifiable as they can be. Until the day that Wikipedia's sections are actually separate pages transcluded to one article (in which case they can be tracked), similar to the way it's done on Wikisource, I can't support this. And I'm assuming that WP won't be changing its structure anytime soon. Hazmat2 (talk) 17:55, 27 April 2015 (UTC)[reply]
Oppose this would increase inconsistency. --T.seppelt (talk) 20:17, 3 May 2015 (UTC)[reply]
Oppose The Wikipedias can instead mark their sections with a corresponding Wikidata-ID, that way software can connect the items. FreightXPress (talk) 18:42, 12 May 2015 (UTC)[reply]
- Interesting proposal (although that should be proposed on Wikipedia, not here). I think it could also be useful if each time someone clicks on a Wikipedia redirect on a Wikipedia site, a pop-up shows an *alternate link* to an available Wikidata item for that specific redirect, instead of following the redirect to its destination (wherever that may be). I am sometimes confused after clicking on a Wikipedia blue link that takes me to some huge Wikipedia mega-page that is a catch-all for all sorts of things (like the en:Insurance Wikipedia page for example). On second thought, it might be a good idea to passively link redirects *to* Wikidata items *from* Wikipedia in such a way that user could access them from the "What links here" feature. This would not need any extra functionality from Wikidata and for experienced Wikipedians, it could become a fruitful way to encourage more articles to be written. Jane023 (talk) 08:07, 13 May 2015 (UTC)[reply]
Multiple sitelinks per item and site
editThe proposal mentions that "two or more external pages on the same site might be referred to by the same item". I think that this would be harmful. Let us have two different articles on the same site. Either they are about two different concepts and each of them should link to a different item, or they should be merged. Petr Matas 22:28, 13 December 2014 (UTC)[reply]
- @Petr Matas: A page can hold a description of several entities, but in Wikidata we have done a simplification so we assume a single entity is described on a single external page. Said another way; we assume that the URL to the page is the same as the URI for the entity. Jeblad (talk) 19:44, 4 January 2015 (UTC)[reply]
- I am going to play a dumb man to clear any misunderstandings: You say "on Wikipedia, one page → multiple items is a common practice," right? Now how would Wikidata's support for one item → multiple pages (i.e. the other way around) help with this? Petr Matas 23:36, 5 January 2015 (UTC)[reply]
- Sorry my description is wrong. The first part of my first sentence is implicitly about Wikidata, the second part is about Wikipedia. A page on Wikidata can hold information about several entities, each as an item, but it does not for now. A page on Wikipedia only a single entity is described on each page, because it is a lexicon. So long as we maintain a 1-to-1 relation between an item and a page on a Wikipedia-project all is good, but at Wikipedia there are some pages that describe several entities on one and the same page. That is when we run into trouble with this simple model.
- It is worth nothing that the redirect solution advocated by some has equally big problems, if not bigger, as it is easy to allow multiple links to one and the same entity this way but it is difficult to maintain consistency. Jeblad (talk) 16:11, 22 May 2015 (UTC)[reply]
Elsewhere the proposer says the opposite: "An item on Wikidata will only be able to hold one URI to the external site". Petr Matas 22:45, 13 December 2014 (UTC)[reply]
- @Petr Matas: Note that this is about the URI and not the URL. Because the URI includes the hash there can be identifiers for each of the entities described on the page. If the page only holds a single description then the URI and the page URL can be assumed to be the same. Jeblad (talk) 19:44, 4 January 2015 (UTC)[reply]
- According to my knowledge:
- URL is a URI, which says how to access the resource. For example,
http:
is a URL scheme andmagnet:
is not, but both are URI schemes. - Every URI (be it a URL or not) can contain a fragment identifier.
- URL is a URI, which says how to access the resource. For example,
- Petr Matas 23:36, 5 January 2015 (UTC)[reply]
- According to my knowledge:
- URI as an identifier including the fragment, URL as our present use of links to Wikipedia which does not include the fragment. Jeblad (talk) 16:13, 22 May 2015 (UTC)[reply]