Wikidata:Lexicographical data/Development/Proposals/2013-02

Here are some notes for Wikidata support in Wiktionaries.

Phase 1 - interwiki

edit

One possible approach to organise this information on Wikidata could be to create a new namespace, where the name of items will not take the form Q123, but wikt:name_of_item (there being no need to use numbers since the titles are all the same).

It might also be possible to use this function to help with the cross-wiki links in translation tables. For example, this is part of the translation table for dog on en.Wikt:

French: chien m (fr)

It would be nice to use Wikidata's data to show the "(fr)" link in red/blue depending on whether frwiki has an page called "chien". Currently bots do this work.

Two problems with this

edit
  • What about project-space? Presumably we would want to link together all the Community Portals, etc, so we can get rid of interwiki bots. This would have to be more like Wikidata's original Phase I setup.
  • Couldn't this just be populated automatically from the databases, bypassing Wikidata altogether?
    • What about wikis with different scripts/naming conventions (e.g. frwikt's use of curly apostrophes)?

Phase 2

edit
  • Language-specific wikt:items should be linked-to from related Wikipedia items.
  • For wikt:items, special properties will be needed — pronunciation in various languages, declension, type, etc., which will have no use in Wikipedia-specific items.
  • There are some — hopefully — community-agnostic properties that would be easier to do before even thinking about pronunciation and such (which would mean changing the whole structure of Wiktionaries and unifying the various language communities...). Namely:
    • Sort keys, e.g. the word espérer in language fr has sortkey esperer (simplified, it's a bit more complicated than that). This should be the same for all categories in a given language.
    • See also e.g. the articles mere, mère, Mere, which must have a link between them. (On some wikis this is currently done, some of the time, using templates like en:wikt:Template:also.)
  • Languages informations. Right now all languages are coded with a code (e.g. en = English) and put in a template e.g. wikt:en:Template:en (and other informations are stored in yet another template) in most Wiktionary versions. We could centralize every information in Wikidata : language name in every language, language code, script used, Wikimedia code, etc. Right now the best alternative to isolated templates is a Lua module (e.g. wikt:en:Module:languages).

Example

edit

item named Wiktionary:Most

edit
Statements
  • IPA-de: mɔst
  • pronounc-de:
  • IPA-cs: mɔst
  • pronounc-cs: Cs-most.ogg
List of pages linked to this item
[[cs:Most]]
[[de:Most]]
[[en:Most]]
[[fr:Most]]
[[hu:Most]]
[[ko:Most]]
[[pl:Most]]
[[ru:Most]]
[[sv:Most]]

item named Wiktionary:most

edit
Statements
  • IPA-de: mɔst
  • IPA-cs: mɔst
  • IPA-en-uk: məʊst
  • IPA-en-us: moʊst
  • IPA-hu: moʃt
  • IPA-pl: mɔst
  • IPA-sh: môːst
List of pages linked to this item
[[cs:most]]
[[de:most]]
[[en:most]]
...

See also

edit