Wikidata talk:Lexicographical data/Archive/2013/09

Latest comment: 10 years ago by Blahma in topic Support

First and second phases

Let's be pragmatic and just import the interwiki links. This is the easiest and biggest advantage of Wikidata we can realize with a quick schedule, because that's what we're going to do in any case. After that, we can build on the foundation. I think, that second phase should be importing the see also markups on top of pages -- these are related to the page, not to words in certain languages. The see also markups can be figured out with some program that tests all possible alternatives and saves the one's that are real words in English Wiktionary, the use of a database dump would be smart to achieve this. The rest, "all the fancy stuff", can be implemented later step-by-step, but figuring out how best to do it probably takes time. --Hartz (talk) 13:13, 7 September 2013 (UTC)

I agree. We still have to decide how we want to use interwiki links: the articles do not have necessarily the same name on all languages, here is a table I did a while back:
Type fr de en
Articles 2,337,895 266,965 3,422,938
Redirects 15,119 694 17,218
No interwiki 1,438,154 99,429 1,631,903
Direct interwikis 899,177 167,485 1,790,878
Apostrophe interwikis 784 63 0
Capital interwikis 130 149 54
Other interwikis 938 217 542

In this table, I compared the interwiki links to the article title in three wikis. When the interwiki is exactly the same, it's counted as Direct. But interwikis can also be written with a different apostrophe or a different capitalization or punctuation (sometimes it's just an error).

We have direct interwikis for most articles (>99,99%), in those cases we don't even need to list the name of the pages in all chapters: just the chapters that have this article. But what do we do when different communities have different rules for apostrophes, capitalization and punctuation? Darkdadaah (talk) 12:49, 9 September 2013 (UTC)

I can see 3 solutions:
  1. We write every sitelink as is (like Wikipedia interwiki links).
    • This would be the easiest but it would be a waste since more than 99% of articles have the same name.
  2. We create special sitelinks so that all languages in the list share the exact same title (i.e. use the title of the label, no need to write a sitelink-title). The different names in other wikis would be solved by local redirections.
    For this we would need a special kind of sitelink. But the worst part would be that it would only move the work from local interwiki bots to local redirecting bots (albeit for a smaller number of articles).
  3. We create special sitelinks so that all languages in the list share the exact same title (label), but we would use normal sitelinks for pages with different article titles.
    We need the same kind of special site-links as the second solution, and we need to be able to easily switch from normal to special sitelinks.
I don't know what is the best (maybe there is a better way). Darkdadaah (talk) 13:54, 9 September 2013 (UTC)
  Oppose The proposal explicitly made clear which are the problems in importing Wiktionary in here, and what will be the approach to this import. Plus, there's no rush in starting phase 1, since many details have still to be decided upon and the Wiktionary communities should have their say too. --Sannita - not just another it.wiki sysop 14:35, 9 September 2013 (UTC)
The proposal is not about language links (that are actually currently in use in Wiktionary) but about linking lexemes, which completely different. There is of course no way that we will reach any consensus about lexemes and such anytime soon (see the latest proposals), however language links between Wiktionary pages are child's play compared to that. We can do it quite easily, why should we wait for something unrelated and that is still so far away? Darkdadaah (talk) 15:25, 9 September 2013 (UTC)
Because we'd risk to do the same work twice, if we start now with phase 1. See this paragraph: "Wikipedia and Wikidata have basically a 1-to-1 mapping of articles in a Wikipedia to items in Wikidata. This is not the case for this proposal for Wiktionary: [...] Therefore language links for the actual words in Wiktionary should not be moved to Wikidata and provided from there." We may start with namespaces ns≠0, but I don't see the point for starting only with those namespaces. Sannita - not just another it.wiki sysop 16:59, 9 September 2013 (UTC)
I don't understand then: are you saying that we don't need language links as they exist today on Wiktionaries? Or that they will be replaced by something else? Darkdadaah (talk) 17:24, 9 September 2013 (UTC)
Ok, let's start again.
There has been a long and thorough discussion here about how to include the pages of Wiktionary here. Almost immediately the problem of the meaning of words in various languages arose. You already know that there can be several words to say the same thing, depending on the language (i.e. "apple", "mela", "apfel", "manzana", "maçã" and "pomme" have the same meaning); or that the same word can have different meanings, once again depending on the language (i.e. "burro" means "butter" in Italian and "donkey" in Spanish).
Now, such differences are quite difficult to render here on Wikidata, given the fact that Wikidata is not just about interlinks, but about structured data. If the same concept can be expressed by several words, or one word can express different concepts, our work here becomes too hard, because simply we won't know which item we should use.
So, after months of discussion, there is this new proposal, based on lexemes and not on pages. It means that "burro" will have two separate lexemes, one for the Italian meaning and one for the Spanish meaning. Same applies to "apple", "mela", "apfel", "manzana", "maçã" and "pomme": one lexeme, one language, but several forms and several meanings (all related to that language). This will also have to wait, because there are some problems with calling the data from different items.
This is why I'm telling you that phase 1 for Wiktionary cannot start right now: because a completely different approach is being studied, based on a completely different basis. We would risk, right now, to do the same work twice: once in vain, and the second time for good (maybe). Since there is no rush, and since the next project on the list is Wikimedia Commons, I just propose to wait for the developers to tell us if there is some news, without putting much pressure on them.
Hope this time what I mean is more clear. Sannita - not just another it.wiki sysop 20:54, 9 September 2013 (UTC)
I don't mean to rush, sincerely. However there is a clear misunderstanding here. As far as I know, "language links" are defined as links between the same pages between projects in other languages. If we want to create links between lexemes (not pages), then it shouldn't just be called "language links". Every time I saw any discussion about "interwikis" and Wikidata on Wiktionary, everyone spoke about links between pages, not lexemes. Now, let's say that we only work on language links between lexemes; the article pages on each Wiktionary will still remain: they will still include normal language links. Does the proposition mean that we will get rid of those "normal" language links, or that they will have to remain as is? That is the part that is unclear. Darkdadaah (talk) 09:03, 10 September 2013 (UTC)
You only saw discussions about interwikis, because this proposal is fairly new - it has been drafted last August. I don't know about how links between [[en:word]], [[fr:word]] and [[it:word]] will work. The proposal at the moment says that "most of these links can be done trivially by merely linking to the page with the same name on the other Wiktionary", but I'm not ok with this. Still, I don't know what to propose. --Sannita - not just another it.wiki sysop 21:09, 10 September 2013 (UTC)
Both concepts (links between articles that I will call "interlanguage links" & links between words/lexemes that I will call "lexeme links") were confused, so much that it may seem that doing both would to be redundant, but this is hardly the case: we can't derive interlanguage links from lexeme links, or the opposite, and we do need both. The "interlanguage links" are exactly what I wanted to address (the "most of these links" is reflected by the number of "direct interwikis" in the table above). Darkdadaah (talk) 12:30, 19 September 2013 (UTC)

I think that plain interwiki links should be noncontroversial, and would save a lot of distraction from bot edits. Cheers! BD2412 (talk) 17:58, 19 September 2013 (UTC)

I concur with my fellow Wiktionary administrator BD2412: if Wikidata could take interwiki linking (liking between wikt:en:Katze and wikt:de:Katze) off the Wiktionaries' hands, that would be very helpful and hopefully non-controversial. Whereas, you must be aware of how much debate has surrounded the suggestion of linking meanings (e.g. wikt:en:cat#English and wikt:de:Katze#Deutsch). In fact, one (perhaps not fully formed) question has just occurred to me: where will wikt:de:Katze#Englisch fit into the system of links between wikt:en:cat#English and wikt:de:Katze#Deutsch? Regarding apostrophes: Wikidata could directly link between wikt:en:c'est and wikt:fr:c’est, but I think it could also achieve the same effect by allowing interwiki links to redirects, like the Wiktionaries currently already do. Note that wikt:fr:c’est la vie links to wikt:en:c’est la vie, which then redirects to wikt:en:c'est la vie, which then links to wikt:fr:c'est la vie, which redirects to wikt:fr:c’est la vie. -sche (talk) 18:37, 19 September 2013 (UTC)
I support also the proposal of Darkdadaah (wikt:en:Katze, wikt:de:Katze, wikt:fr:Katze, ...) Pamputt (talk) 20:21, 19 September 2013 (UTC)

Outsider question

While I understand perfectly well why Wiktionaries vitally need a database solution like this (in general terms), I somehow cannot see how it would be beneficial to chuck it together with Wikidata (as it is now). Are there any intersections at all? You know, it is hard to follow all the RfCs and discussions on semantic properties and merging items already... And the architecture of Wictionary is bound to be fundamentally different...? Littledogboy (talk) 02:19, 28 September 2013 (UTC)

Support

I just want to express my support for this proposal and to back Denny's arguments expressed above in the section "Why?". As for a computer linguist, it's exactly the current lack of structure in Wiktionary and lack of data sharing among its language versions that makes it very difficult for me to process any Wiktionary data automatically and that discourages me from any active participation in the Wiktionary project. Just before discovering this proposal and this discussion, I used arguments that are too a great extent similar to Denny's at the Czech Wiktionary's village pump. Blahma (talk) 14:42, 28 September 2013 (UTC)

Return to the project page "Lexicographical data/Archive/2013/09".