Wikidata:Catsanisa Imiklamo kanye Netiphakamiso te Wiktionary

This page is a translated version of the page Wikidata:Comparison of Projects and Proposals for Wiktionary and the translation is 48% complete.
Outdated translations are marked like this.

Inhloso yalomsebenti kucoca ngesiphakamiso lesisha lesisekelwe kuletimbili letendlulile letenteke macondzana nekusekelwa kwe Wiktionary nge-Wikidata ngekuniketa idatha yelulwimi lehlelekile lengamelwa ku Wiktionary. Loku kufaka ekhatsi kuphendvula imibuto letsite macondzana nekwakheka kwanyalo kwe-Wikidata, njengoba kungasiyo yonkhe tincumo letivela ngekushesha. Tibonelo tifaka ekhatsi kubaluleka kanye nekuhlukaniswa kwencenye yelulwimi ku-Wikidata, kusetjentiswa kwemagama lavelako, kepha langasiwo emagama lafananako, kumelwa kwawo lokuhlelekile nobe kwakheka/kufakwa kwalokungenako lokungahlanganisa tigaba letehlukene, ngekwako. Phindze, kwakhiwa kwemiklamo lefanako - i-WordNet, i-EuroWordNet kanye ne-OmegaWiki - kanye ne-lemon, imodeli yekwabelana ngelwati lwetinkhulumo, kutawuhlanganiswa.

There will be an example-item in each section in order to illustrate the differences between them. The example in each case shows, how the different structures represent the words ”Hamburger” in English, meaning ”person from Hamburg” and the word ”Hamburger” in German which can either mean ”person from Hamburg” or ”hot sandwich consisting of a patty of cooked ground beef, in a sliced bun, sometimes also containing salad vegetables, condiments, or both” (Note that this can be translated to "hamburger", which however is not the same as "Hamburger" due to capitalization). In each system, a box will then represent an entry, ie. a separate item-page. Of course, in the most cases, the contents of these boxes will be heavily reduced in order to obtain a more schematic representation.

Terminology

In order to avoid misunderstandings, what follows is a short outline of a terminology that will be used throughout the document. A more extensive glossary of terms used in the following chapters can be found under Further Wikidata-Terminology. Those terms that are only needed in certain sections will be introduced there.

indlela lengaphandle
Leligama lisetjentiselwa kuchaza indlela lengaphandle kuphela (indlela ligama lelibhaliwe ngayo), kunganakwa indlela lengaphandle yemisindvo (indlela ligamu lelibitwa ngayo) nalokunye. Loku kuphindze kusho kutsi indlela lengaphandle yeligama lelitsi "go" njengeligama lesiFulentji lemdlalo webhodi ilingana nalelo lelitsi "Go" njengesento sesiNgisi. Emagama latsi "Polish" na "Polish", nomakunjalo, awahlangani nesimo lesifanako, ngenca yekubhala yinye kuphela.
gloss
Gloss yinchazelo lemfishane yaloko leligama/inkhulumo lelikusho kepha lengadzingi kuba nemininingwane njengenchazelo lengasetjentiswa ngekuntjintjana neligama/ inkhulumo. Sibonelo seligama kanye nekuhlakanipha kwalo "Fado - luhlobo lwemculo".
expression
The term expression will refer to same surface forms that also share the feature-value for language and have the same morphological features, thus also belong to the same word category. The English adjective “blue”, referring to the color and the English adjective “blue” referring to a melancholic state of mind are for example the same expressions (just with different meanings) but the English verbs “tear” (to rend by holding or restraining in two places and pulling apart, whether intentionally or not; to destroy or separate) and “tear” (to produce tears) are not, due to different morphological values such as different inflection-forms: tear/tore/torn versus tear/teared/teared. The English noun “bike” and the English verb “bike” are not the same expressions due to different word categories and neither are the English noun “chat” (an exchange of text or voice messages in real time through a computer network) and the French noun “chat” with the same meaning, due to different languages.
inchazelo
Inchazelo yeligama/emagama lamanyenti-inkhulumo lehlanganiswe nenchazelo lehlobene nayo. Ligama lesiNgisi lelitsi "kat" lingasho kutsi lihlanganyele engcocweni lengakahlelwa, "chat" ngesiFulentji lingaba nenchazelo lefanako nobe libe nenchazelo yeligama lelitsi "cat", nanobe kunjalo tinchazelo tato tehlukile, njengoba emagama ehlukile. Inchazelo incike kuleligama lelivela kulo - kuniketa lesinye sibonelo: "Le Mépris" (lifilimu la Godard) kanye "Pierrot le fou" (lifilimini la Godard). Inchazelo esibonelweni sekucala ngu "Le Mépris" (lifilimu la Godard) bese kutsi esibonelweni sesibili ngu "Pierrot le fou" (lifilimini la Godard).
entry
The term entry denotes the basic presentation-unit in any of the different dictionaries and is in this respect equal to the term page. For Wikidata, an entry is a wiki page, and is the basic editorial unit, used to track edit histories, authorship, etc.

Wiktionary and Wikidata

Lencenye ihlose kwetfula kusebenta kwe Wiktionary kanye ne Wikidata kwanyalo kute kwakhiwe sisekelo sekuhlola kwaletindlela letintsatfu. Letinkhulumo tilandzelwa sikhutsato lesifishane, kutsi kungani kuhlangana kwalemiklamo lemibili ye-Wiki kungaba lusito.

Wiktionary

Fig. 1: Example entry "Hamburger", Wiktionary
Fig. 2: Alternative Example entry "Hamburger", Wiktionary

Indlela ye-Wiktionary levulekile kutsi ibe nesichazamagama setilwimi letehlukahlukene lesinemibhalo levela kuto tonkhe tilwimi. Ku Wiktionary yesiNgisi kunemibhalo lemayelana neligama lesiJalimane lelitsi "Unterstrich" kanye neligama laseHungary lelitsi "tüzes" nalokunye. Yonkhe imininingwane lebhalwe kulombhalo (ngaphandle kweligama lelulwimi lwakulelinye live) ibhalwe ngesiNgisi futsi nemagama esiNgisi anemibhalo. Ku Wiktionary yaseThailand, tibhalwe ngesiThai nalokunye. Kwanyalo, kunemisho levela etilwimini letingu-1062, lapho khona kunemagama langu-522 langetulu kwalelishumi langenisa: inkhulumo. Kwanyalo, kunelulwimi lolungu-170 versions[1].


Lokungenako kuhlelwe ngalendlela lelandzelako: Lokungenako kusebenta ngato tonkhe tinchazelo letivela kuto tonkhe tilwimi ngekuya kwendlela yinye yekwakheka. Loku kusho kutsi kokubili ligama lesiJalimane lelitsi "ingalo" kanye neligama lesiNgisi lelitsi "ngalo" kuhlanganiswe nalokufanako. Letincenye letincane tihlelwe ngekuya kwelulwimi (lulwimi lwe Wiktionary lwangaphambilini luhlelwe ekuhleleni tincenye letincane) kanye ne-POS, lokusho kutsi umbhalo welikhasi lelimboza luhlobo lolungaphandle wehlukaniswe ngetincenye tekukhuluma, letihlukaniswe ngetincwadzi temcondvo nangabe yinye inemcondvo leminyenti. Uma tinombolo tetintfo letifana nekuphimisela nobe emagama tihlanganisiwe, tingavela ekucaleni kwencenye leyehlukene, uma tihlukile, tihlukaniswe tibe tincenye tato. Kunemehluko lomncane emkhatsini wetinhlobo letehlukene tetilwimi, njengeliciniso lekutsi siJalimane siphindze sihlanganise kuhumusha kuletinye tilwimi (hhayi kudidaniswa nemaWiktionary kuletinye tilwane letisebentisa indlela lefanako) kanye nema-link kuletitfolakala kuWiktionary yesiJalimane, emaWiktionary esiNgisi nobe esiFulentji njengesibonelo awabhalisi kuletinye tillimi. Ngaphandle kwetintfo letinjalo, tinhlobo letehlukene telulwimi tilandzela indlela lefanako.


Lwati lolutfolakala kuWiktionary luhlanganisa kuphimisela (kuhunyushwa kwe-IPA (/Sampa) kanye nemibhalo levakalako), emagama, ema-anagrams, ema-synonyms, ema'hypernyms nalokunye, ematafula ekugucuka, imisho lebonelo, kuhumusha (kanye, ema-link ku-translation), ema-link kumcondvo (s) ngemuva kwenchazelo (s) ye-Wikhulumo (s) kanye nema-link kuletinye tigaba te-Wikhalisa. I-Wiktionary iphindze ibe nemibhalo yekuhlanganisa, ema-acronyms, tifushaniso, kubhala kabi kanye nekubhala lokulula.

Umfanekiso 1 ukhombisa kutsi iWiktionary imele njani "i-hamburger" - sibonelo.

Caphela kutsi kulamanye emaWiktionary kuphimisela kwemagama lamabili esiJalimane kungahle kube ngembili kwetincenye temagama letifanako njengekuphimisela esibonelweni lesingenhla kute kugwenywe kuphimisela (buka Umfanekiso 2).

Wikidata

Fig. 3: Example statement in Wikidata

I-Wikidata i-database levulekile legcina idatha lehlelekile ngelulwimi. Lokwatiswa ngetintfo kuhlanganyelwa ngito tonkhe tilwimi, njengoba emalebuli emibono (lehlanganiswe ngetintfo) kanye netintfo letihlanganisiwe timelelwe ngetilwimi letehlukene - kepha tibekwe ku-ID yalentfo/tintfo. Ngetulu kwaloko, luhlobo lwesitsatfu lwentfo luhlelwe: imibuto. Loku kutawusetjentiswa ekwakhiweni kwetinhlu letifana neluhlu lwetintfo letengetiwe tekudla nobe luhlu lwemifula lolunelwati lolubalulekile.

Every item can have a list of statements. The information that Berlin (Q64) has the status of a state in Germany for example is represented by a statement claiming “type of administrative division: state of Germany”. These claims, consisting of a property and a value, and potentially qualifiers, are accompanied by a (possibly empty) list of references. An example statement is given in Figure 3.

Letakhiwo (kuFig. 3 "Bantfu") tintfo letichazwe encenyeni yetakhiwo futsi tingadalwa basebentisi njengaletinye tintfo. Tibonelo nguleti: lusuku lwekutalwa (P569, lusuku lwekutalwa kwalomuntfu), kusayina (P109, sitfombe sesignesha semuntfu) noma likhaya labokhokho (P66, indzawo yemvelaphi eChina yabokhokho bemuntfu). Lwati lolutfolakalako nyalo ku-Wikidata lufakwa basebentisi, incenye yayo yentiwe ngesandla kantsi incenye yayo ingentiwa ngema-bot. I-software lesisekelo yi-Wikibase. Kuyacaca kutsi angeke kube nesibonelo sekubhalwa kwemagama latsi "Hamburger" kusengakenteki, njengoba tindlela tekubhalwa kwe-Wikidata letingaba khona etintfweni telulwimi ngito letihloko talombhalo.

Motivation

Kubonakala kucacile kutsi kutfola imininingwane yelulwimi lehlelekile kungaba lusito lolukhulu kuletingu-170 letitsembele elulwimini lwe-Wiktionary.

  • Firstly, it would reduce the editing-effort, as information can be drawn from the database automatically if desired.
  • Secondly, as the same holds for corrections that could then have an effect in all entries in all languages at once, a higher information quality can be achieved.
  • Thirdly, this may lead to more extensive entries also in Wiktionaries from smaller languages.
  • Fourthly, having a vast collection of free, structured linguistic data will be very useful for natural language processing applications, researchers, linguists and people “just personally” interested in linguistic structures that can be browsed easily.

Kuvuma sidzingo selwati lolutsite lolungahlelekile - lokungenani: kuba netinchazelo kanye netinchazi temagama langaphandle ngelulwimi lwakho, lokungase kwehluke futsi lokutawuhluka ngekwemongo ngenca yemasiko/tilwimi letehlukene - , kute kube nesifiso sekubeka iWiktionary esikhundleni seWikidata kepha kusita kugcina nekukhulisa ngetindlela tekuniketa kokubili luhlelo, kanye nesisekelo sekubeka lwati.

Comparison of the Structure of other Projects

Kulesigaba, takhiwo letisisekelo temiklamo ye-WordNet, EuroWordNet kanye ne-OmegaWiki titawucatsaniswa kute kukhonjiswe umehluko wekwakheka emkhatsini wabo kanye ne-Wiktionary.

WordNet

Figure 4: Example entry "Hamburger", WordNet

Structure

WordNet is a free dictionary for English by the Princeton University. Every entry (a word or multi-word term) is associated with one or more so-called synsets. Those group together words that, in a certain context, are synonymous, see the two synsets for the expression “copper” as examples. Further, a gloss is provided, giving a short explanation/definition of the synset. In most cases, there are also sentences for example use.[2]

  1. S: (n) bull, cop, copper, fuzz, pig (uncomplimentary terms for a policeman)
  2. S: (n) copper, copper color (a reddish-brown color resembling the color of polished copper)

Letinhlanganisela tihlanganiswe ngekuhlobana kwe-ontological, ikakhulu ngekuhlobisana kwe-hyponym/hypernym. Loku kusho kutsi sibonelo, i-synset {photograph, photo, exposure, picture, pic} ingamelelwa njenge-hypernym lecondzile ye {shading picture}, {still} noma {snapshot, snap, shot}, ngesikhatsi isebenta njenge-hyponym lecondzile ya {representation}. Kuphindze kufakwe budlelwane lobucondziswe njengekutsi "kuvela lokutsite". "Lamagama lamancane" emagama, tento, ema-adjective kanye nema-adverbs aphatfwa ngekwehlukana ngetinhlavu letimbalwa te-POS. Lokwakheka, nomakunjalo, kuyafana kuyinye yato. Sekukonke, kunema-synset langu-117 000.

Terminological Contrasts and Comparison with Wiktionary

In regards to our terminology, one WordNet-item correlates with the term expression: one surface-form with a certain language (in WordNet: English) and one POS-tag. A WordNet-synset is similar to what we define as sense – a certain meaning of a linguistic entity. Here, however, it is represented by a set of synonyms, whereas Wiktionary represents a sense by attaching a gloss/definition to the linguistic entity.

I-WordNet isebenta ngemsebenti wekuhlanganisa tinhlavu kanye nalokunye kuhlobana ngendlela ye-thesaurus yelulwimi lunye. Lomunye umehluko lomkhulu ku Wiktionary usemibhalweni leyehlukene yetinchazelo (kuhlanganiswa kanye netinchazelo kanye netincazelo). Ngetulu kwaloko, i-WordNet, ngalokungafani ne-Wiktionary ayiniketi lwati lwe-morphological nobe lwe-phonological lwemagama, ayisondzeli etinkhulumiswaneni futsi ayiniketi kuhumusha.

The Hamburger-Example

Caphela kutsi i-WordNet ihlanganisa emagama esiNgisi kuphela, ngako-ke kute kumelelwa kwemagama esiJalimane esibonelweni se-hamburger. Ngetulu kwaloko, i-WordNet ayiniketi lutfo lwekubhalwa kweligama lelikhulu lelitsi "Hamburger" ngesiNgisi. Umfanekiso wesine ukhombisa umbhalo wesibonelo we-"Hamburger" njengoba bekungaba ngekulandzela luhlelo lwe-WordNet. Njengobe kukhonjiswe, kunencenye yinye kuphela kuloku. Phindze, loku akuuniketi emagama lafananako, kepha kuphela kucacisa kanye nelwati lwekuchumana.

EuroWordNet

Figure 5: Example entry "Hamburger", EuroWordNet

Structure

Nanobe i-WordNet ihlanganisa emagama esiNgisi kuphela, ngekucala kwe-EU-project EuroWordNet, i-WardNets nge-Dutch, iSpanish, i-Italian, i-German, i-French, i-Czech kanye ne-Estonian yasungulwa futsi yahlanganiswa, lokwaholela ekwakhiweni kwe-database yetilwimi letinyenti. Ngekusebentisa ema-interlingual links lagcinwe ku-Inter-Lingual-Index (ILI), ema-wordnets lasuka kulolunye lulwimi ahlanganiswa nalolunye. Njengobe inhloso yaletinkhulumiswano kuhambisana "nekulingana" ngetilwimi letehlukene, kute budlelwane emkhatsini we-ILI-Records yinye lobusungulwe. Lomsebenti uhlala ku-WordNets yinye. Loku kuphindze kuvumele kukhuliswa kwabo lokulula ngobe kute kuvumelana lokudzingekako kutsi kugcinwe onkhe emacembu.

Language-internal relations have been broadened with the start of the project, new ones were added and relations now have features such as conjunction or disjunction - "airplane" can have the meronyms "door", "jet airplane" and "propeller". The word "door" can have the holonyms "car", "room" or "airplane". Also, in EuroWordNet, links between synsets with different POS-tags are stored.

Nomakunjalo, kuhambisana kwaletintfo kungaba lukhuni - imicabango ingase ingekho ngetilwimi letehlukene noma, kanye nebudlelwane bayo lobungena nalobuphuma, kungavumelani. Sibonelo saloko kutsi umcondvo ungahle ube ngu-hyponomic locondzene nalomunye lulwimi, kepha hhayi kulolunye. Ngako-ke kulukhuni kuphetsa nobe kufinyelela ekuhlanganeni ngekuhlanganiswa kwetilwimi.

Terminological Contrasts and Comparison with Wiktionary

Emagama latsi item, synset kanye ne gloss ayafana nalawo latfolakala ku-WordNet, njengobe kuchaziwe ngenhla. I-POS-constraint emkhatsini we-synsets levela etilwimini letehlukene, nomakunjalo, iyashintjwa: lapha, ema-equivalence-links angaphindze abe emkhatsini we-sinsets netintfo letinema-POS lahlukene. Njenga-WordNet, i-EuroWordNet isebentisa tinhlaka njengendlela yekumelela imiva, lokungumehluko lomkhulu ku-Wiktionary. I-EuroWordNet ngalokungafani ne-WordNet inetilwimi letinyenti kangangekutsi ichumanisa tinhlanganisela tetilwimi letisikhombisa. Lomunye umehluko lobalulekile ku Wiktionary, nomakunjalo, uphindze usuke ekutseni ungabi nelwati lwe-morphological kanye ne-phonological.

The Hamburger-Example

Njengobe kusetjentiswa kwe-EuroWordNet akusiyo, ngalokungafani ne-WordNet, mahhala, lesibalo lesingenhla sakhiwe ngekuya kwaloko lokumele sibukeke njani uma sibuka sakhiwo kanye naletinye tibonelo.

OmegaWiki

Fig. 6: Example entries "Hamburger", OmegaWiki

Structure

OmegaWiki sichazamagama setilwimi letinyenti lesivulekile lesinetinhloso tekucaphuna onkhe emagama ato tonkhe tilwimi lanetinchazelo kuto tonkhe tilwimi, kufaka ekhatsi lwati lwetilwimi, emagama kanye nelwati lwetilwimi".

The internal structure relies on entries regarding one DefinedMeaning (DM), which is a combination of an expression together with its definition. This definition is regarded to be language-independent and therefore translated into the different languages. Speaking in the terms discussed in the terminology, one DefinedMeaning thus corresponds to a sense. Hence, in example 3 and 4, there are separate pages for the following because they are two distinct DefinedMeanings – in the first case, there is the expression “song” combined with the definition “A musical piece with lyrics…”, in the second one, the expression “song” is combined with the definition “The act of singing”.

  1. ingoma: Incenye yemculo lenemagama (noma "emagama ekuhlabela"); inkhulumo umuntfu langayihlabela. # ingoma: Sento sekuhlabela.

Ngetulu kwaloko, kunetincwadzi te-DM kuletinye tilwimi letinemisho lefanako. Kulesibonelo, kunekubhalwa kweligama lesiFaroese "ingoma", lelihunyushwe ngekutsi "umbhedze". Lomunye, nomakunjalo, umelelwe kulokunye lokubhaliwe. Kunetinhlu ngelulwimi lolunetinchazelo letehlukene ligama lingaba nato, lokusho kutsi luhlu lwato tonkhe tinhlobo te-DM letihambisana netinhlobo telulwimi. Nomakunjalo, lamakhasi aneluhla lwema-DM lakhona ngemusho ngamunye kanye nelwati lwawo futsi akwenteki kutsi kuhlanganiswe lwati lolufanako (njengekuhlanganiswa kwelulwimi lunye) emkhatsini we-DefinedMeanings leyehlukene. Loku kudzinga kutsi kukopishwe kuwo onkhe emakhasi ekucala latsintsekako.

Ngaphandle kwalamakhasi e-DefinedMeaning, i-OmegaWiki iphindze igcine budlelwane lobutsite be-semantic kanye ne-ontological emkhatsini wema-DM. Loku kufaka ekhatsi emagama lafanako, emagama lafananako nalokunye kanye nemagama lafananisiwe, kuhumusha kuletinye tilwimi nalokunye.

Mayelana nalobudlelwano, kuyenteka kwehlukanisa emkhatsini webudlelwano lobucondzile nalobungasilo liciniso. Sibonelo sinye saloku kungaba sinonymy: Nanobe ligama lesiNgisi lelitsi "German" lingakhulumi lutfo ngebulili bemuntfu waseJalimane lokukhulunywa ngaye, ligama lesiJalimane lelitsi "Deutsche" (ngalokwehlukile ku "Deutscher") liphindze lihlanganise lwati lolutsi "sex: female". Njengobe kute ligama lelingase lihunyushelwe kahle egameni lesiNgisi lelitsi "German" - lapho kute khona lwati lolumayelana nebulili lolumelelwe - umuntfu angeke asebentise i-hyponomy/hypernomy-relation kuchaza kuhlanganiswa emkhatsini waletindlela letichaziwe. Ku-database, loku kukhonjiswa ngeluchungechunge "~", lokusho kutsi kuhumusha akusiyo intfo lecondzile. Umsebentisi angancuma kutsi, nguluphi lulwimi lafuna kusebentisa ngalo i-OmegaWiki. Kunetilwimi letingetulu kwa-300. Uma kungena/lwati lungekho ngelulwimi, tintfo titawubonakala ngesiNgisi.

Terminological Contrasts and Comparison with Wiktionary

I-OmegaWiki iniketa kuhumusha, kuchaza ngetilwimi letehlukene kanye nelwati lolumayelana nebudlelwane bemisho. Kulendzaba, indlela yayo iyafana naleyo ye Wiktionary. Lomunye umehluko lomkhulu, nomakunjalo, kusekukhululeka kwetilwimi letehlukene te Wiktionary. Ku Wiktionary, emagama angachazwa/achazwe ngelulwimi lolutsite. Ku-OmegaWiki, kuhumusha tinchazelo temibono letihambisana nalomcondvo lokutsite kugcinwe.

The Hamburger-Example

Sibonelo seligama lesiJalimane/esiNgisi lelitsi "Hamburger" ku-OmegaWiki sivetwe kuSitfombe 6.

Njengobe kuchaziwe ngenhla, kunemakhasi lehlukene ku-DM ngayinye, nomakunjalo, njengobe kukhonjiswe ebhokisini lelisesandleni sangesencele, kute emakhasi lehlukile e-"Hamburger" yesiJalimane - umuntfu wase-Hamburg kanye ne-"Hamburg" yesiNgisi - umuntfu waseHamburg. Caphela futsi, kutsi sihloko lesingasesandleni sangesekudla "hamburger" esikhundleni "Hamburger" futsi lokufana nesiJalimane kubonakala njengekuhumusha.

Overview

The above-mentioned projects WordNet, EuroWordNet, OmegaWiki and Wiktionary differ from each other to quite a large extent. Especially as they are partly pursuing different objectives, it is hard to say which ones are, in a general way, “better” than other ones when it comes to the underlying structures of them.

Mayelana netehluko letingetulu, leminye yemehluko lomkhulu emkhatsini wetihumusho letine iboniswe ethebulini lelingentasi.

Kucatsanisa Imiklamo leyehlukene! Luhlelo!! Kukhululeka!! Umtfombo lovulekile!! Indzawo yekungena!! No. of Languages: Expressions!! No. of Languages: Tinchazelo!! Kuhumusha i-WordNet net netinchazelo te-Wordnet netinchavo te-Wirtone netinchavu te-Wertone netinchubo te-WerdNet netinchawu te-Withnet netinchuvo te-Sithnet net netinchube te-Witton netinchawe te-Sintingue te-Sydroone netinhlavu te-Sidroone te-Sondro-Sydorous

Nomakunjalo, ngekwemvelo kunetincenye letikahle/letingajabulisi kulelo nalelo phrojekthi, lokungase kutsatfwe njengendlela lenkhulu yekwakheka kwesichazamagama. Loku kutawuchazwa kulesigaba.

Representation of Translations and Synonymies

Kulemiklamo lekukhulunywa ngayo lapha, kunetindlela letimbili letehlukene kuloludzaba: Lenye kutsi kuhlanganiswe tintfo telulwimi letivela etilwimini letehlukene ku (tintfo letitsite). Lomunye kucinisekisa kuhumusha kanye nekuhumusha emkhatsini wetinchazelo. Inzuzo yaloluhlelo lwekucala isemkhatsini welinani lelincane letinkhulumo - kuloluhlelo lwesibili, kutawuba netinkhulumo letisuka kulolunye lulwimi kuya kulolunye, lokutfola kulokumatima. Nomakunjalo, ngalendlela kuhumusha lokuncono kanye nemagama lafananako kungafinyelelwa, lokungaholela ekutseni kube nelwati loluphakeme lolugcinwe kusichazamagama. Kungase kubukwe njengentfo lenhle kakhulu etichazwini te-inthanethi kutsi inkinga yesikhala ayisho lutfo njengetichazamagama teliphepha, ngako-ke umuntfu angakhona kusebentisa leliciniso bese ukhetsa luhlelo lwesibili lwekuhumusha/kufanekisa.

Kulemiklamo lengenhla, yi-Wiktionary kuphela lesebentisa loluhlelo: kokubili i-EuroWordNet ne-OmegaWiki tisebentisa tintfo letingacondzakali letisebenta "njengemathuluzi" etincwadzini telulwimi (sicela ubuke imifanekiso yetakhiwo kuletigaba) kantsi i-WordNet ayibambisani naletinye tilwimi ngaphandle kwesiNgisi ekucaleni ngako-ke kute kuhumusha sanhlobo. Mayelana neligama lelifanako, kuhlanganisa lamacembu ngalokucondzile, ngaphandle kwekusebentisa intfo lengacondzakali.

Required Knowledge of Foreign Language and Language Specificity of Definitions

Lenye yetintfo letibaluleke kakhulu tekwakheka kwe Wiktionary kutsi kute kucondvwe kutsi kusho kutsini lulwimi lwakulelinye live, akudzingeki kutsi kube nekucondza lokusezingeni lelisetulu kwalolulwimi. Loku kuhlukile etichazwini telulwimi lunye njengekutsi i-WordNet, lengakhombisi kuhlanganiswa kwetilwimi futsi angeke ilinganiswe kuloku. Ku-OmegaWiki, kungahle kube nekuhunyushwa kwetinchazelo kuto tonkhe tilwimi, lokwenta kutsi kube netilwimi letinyenti. Nanobe kunjalo, lokucuketfwe ngulenchazelo kuhunyushwa esikhundleni sekubhalwa ngelulwimi lolukhetsekile. Ngako-ke umehluko lomuhle kuloko lokushiwo kungenteka ungeke ukhonjiswe etimeni letitsite. I-EuroWordNet iphindze icinisekise "inchazelo" yelulwimi lolukhetsekile ngenca yekuhlanganiswa kwetilwimi letehlukene. Nomakunjalo, lapha, "kuhumusha" kungamelelwa kuphela nangabe kukhona kuhlanganiswa lokufanako kulolulwimi. Nangabe loku kungasiko, kute ematfuba ekumelela inchazelo ye-synset ngelulwimi lolwehlukile - loku kungatfolwa kuphela ngekusebentisa lwati lwekuhlobana ku-wordnets. Nanobe kunjalo, loku angeke kutsatfwe njengentfo lengekho emtsetfweni we-EuroWordNet, njengoba umgomo lokhetsiwe kukhombisa budlelwane lobulinganako emkhatsini wetilwimi hhayi kuniketa kuhumusha kwetintfo telulwimi lwakulelinye live.

The Scope of an Entry

Kungaba lula kakhulu kumelela tinchazelo letehlukene talenkhulumo yinye ndzawonye, sibonelo, lapho umsebentisi afuna kubuka lokutsite futsi angati kahle, kutsi yini lekukhulunywa ngayo, njengoba kuletindzaba, kukhanya kungahle kungenele kukhetsa emkhatsini wetinchazelo letehlukahlukene. Uma tibutfwe, kunciphisa umzamo wekusesha ngesandla. Ngetulu kwaloko, yonkhe imininingwane lehlanganyelwa emkhatsini wetinchazelo letehlukene (loku kungafaka ekhatsi kuphimisela, emagama, imorphology nalokunye) kungaboniswa ngendlela lephumelelako, ngetintfo letihlanganyelwe letikhonjiswe ngalokufanele. Kute inkinga lenkhulu yekukhombisa tinchazelo letehlukene temagama lahlanganiswe ndzawonye kepha tinzuzo letitsite letingabangela kutsi sakhiwo sicace futsi sibe sifishane, kanye nekubuka lokulula kumsebentisi. Kuloluhlu lwemiklamo lekukhulunywa ngayo, bonkhe basebentisa lenhlanganisela yemivelo macondzana nekubuka, ngisho nobe lokumelwa kwehluka - i-WordNet ne-EuroWordNet ibhekisa kubo ngekusebentisa tinhlaka, i-OmegaWiki ivumela kubukiswa kwe-DefinedMeaning (inchazelo yinye) nobe inkhulumo (lenetinkhulumo letehlukene) ekhasini linye kanye ne-Wiktionary ngisho nemacembu lahlukene langaba ngetilwimi letehlukene nangabe indlela lefanako. Njengobe kute ngisho yinye yato levimbela kubukiswa kwetinchazelo letihlanganisiwe uma kukhulunywa ngekusesha i-database, loku akusebenti njengesici sekwehlukanisa emkhatsini we-EuroWordNet, i-WordNet. i-OmegaWiki kanye ne-Wiktionary kepha kufanele kutsatfwe njengentfo lecondzile uma kukhulunywa ngelulwimi lwe-Wikidata (ngisho nobe kucondza kungahle kube yindlela lenkhulu njengoba i-Wikhetfo ingasebenta kahle kakhulu). Nomakunjalo, yi-Wiktionary kuphela levumela kusebenta ngekugucuka kwekwakheka mayelana nekugcina lwati lolubambelela etinchazelweni letingetulu kwalinye.

Covered Linguistic Material

I-WordNet kanye ne-EuroWordNet ihlanganisa emagama nobe emagama lamanyenti lasuka kuletinye tinhlobo telulwimi. Atikahlanganisi imisho, tintfo letingachazi, lulwimi lwekukhuluma nobe tindlela letigucukako. I-Wiktionary iyawahlanganisa kantsi i-OmegaWiki iyakufaka incenye futsi lokungenani ingaba nelitfuba ngenca yesakhiwo lesisekugcineni.

Features

Mayelana nematfuba ekumelela tinhlobo letehlukene telwati lwetilwimi, lemiklamo leyehlukene ayifani kakhulu ngetindlela letitsite: Kunemisho yekusebentisa sibonelo kuto tonkhe letine, kuhumusha (ngaphandle kwe-WordNet) futsi, ngalokucacile, letinye tinhlobo tetinchazelo. Loku kufana naloku kwenteka nakuletinye tintfo letifana ne-antonyymy, hypernymy nalokunye. Lwati lolumayelana ne-phonology noma i-morphology (ikakhulukati tinhlobo tekugucuka), nomakunjalo, alukhombisi ku-WordNet noma ku-EuroWordNet futsi nelwati lwe-etymological luhlanganiswe ngelizinga lelincane kakhulu nguleti letimbili kanye naku-OmegaWiki. Kulemiklamo lemine, iWiktionary ngiyo kuphela lechaza lesici. Emafayili emsakato angafakwa ku-OmegaWiki naku-Wiktionary.

Lemon

Njengobe imodeli ye-lemon ingaba yimodeli letsembisako enhlosweni yetfu, sakhiwo sayo lesikhulu sitawubonakala masinyane.

Structure

Inhloso ye "lemon" kuniketa imodeli "yekuhlanganyela lwati lwetinkhulumo ku-web yemagama". Endzabeni yetfu, kungahle kube lusito ekwakhiweni kwe-Wikidata njengoba kubeka sakhiwo lesiniketa linani lelifanele le-granularity lesifisa kumelela esiphakamweni sesitsatfu, i.e. Incike elulwimini futsi ihlukanisa emkhatsini wesimo lesingaphandle semagama kanye nenchazelo yawo, lokusho kungena kwe-ontology. Kuto totimbili letinhlobo tekuchumana, budlelwane lobunyenti bungamelelwa futsi bungaphindze buhlukaniswe "njengentfo leyetayelekile" uma kucatsaniswa "nekwehluka" nalokunye. Letinhlobo letehlukene takhiwa njengobe kuboniswe kulomfanekiso futsi titawuchazwa ngekwehlukana.

Lexicon
I-Lexicon icuketse tonkhe tintfo letibhalwe ngelulwimi lolutsite bese ibhala ngekuhambisana nelulwimi.
Lexical Entry
A Lexical Entry represents one lexeme, i.e. a word or multi-word term in a certain language that has one or more forms and one or more senses.
Lendlela yeLexical
Lendlela ye-Lexical ye-Laxical Entry ichazwa ngekubhala kwayo. Kungaba netinhlobo letehlukene te-Lexical Entry letingahlukaniswa tibe yi-Canonical Form - indlela leyetayelekile yekubhala -, Lenye indlela - lengaba yindlela lehlukile futsi lengakavami yekubhala nobe sibonelo, indlela leguculiwe - kanye ne-Abstract Form - indlela lengeke yentiwe, sibonelo, sisekelo seligama. Ngetintfo, letinye tinhlobo tingachazwa kabanti. Sibonelo kungaba "imphahla: sigaba sebunyenti". Letinye tintfo letibhaliwe letivamile tingabekwa ngendlela lefanele. Asikho sidzingo sekukhetsa luhlobo lunye.
Lexical Sense & Ontology
Le Lexical Sense imele budlelwane emkhatsini wekungena kwe lexical kanye nekungena kwe ontology, ngaloko, kuloko lokushiwo ngulokucuketfwe ngulokucukelwe. Endzabeni yemagama lafanako nobe emagama lahlukahlukene, ku-Lexical Entry yinye kusho ku-Ontology Entry lokungetulu kwalinye. Njengobe i-Ontology Entry yinye ingaphindze ibitwe ngema-Lexical Entries lehlukene, kunekuhlobana lokunyenti emkhatsini we-Lexic Senses kanye ne-Ontory Entries.

Letinye tintfo letingaba lusito etinhlelweni tetfu tifaka ekhatsi likhono lekumelela emagama nobe emagama lamanyenti njengentfo yekubhala. Kuphindze kwenteke kugcina lwati mayelana nekuhlukaniswa kwemagama kanye nekuhlanganiswa kwe-morphological. Futsi kungaba lusito kuniketa tintfo ebuhlotjeni. Lomklamo uphindze unikete imodiles yekukhicita ngekushesha, lengeke ihlanganiswe kulesikhatsi kepha loko kungahle kube mnandzi uma sekuncuma kutsi ngabe nekukhicita lwati ngekushesha kutawusetjentiswa njani.

Example

Sibonelo lesilandzelako, lesiphindze sichaze kutsi kuhumusha kungasetjentiswa njani, kutsatfwe ku lemon cook-book.

Luphiko lwangesencele lwesitfombe lesingenhla lukhombisa ema-Lexic lamatsatfu lahlukene (ngesiNgisi, siJalimane, siFulentji), ngayinye yawo inalokucuketfwe lokufaka ekhatsi i-LexicalEntry ("kat: LexicalEntry" ngesiNgisi, "chat: Lexical Entry" ngesiFulentji, "katze: Lexical-Entry" ngeJalimane) futsi ngayinye yalamabhokisi e-LexikalEntry-Formfanekiso lobhaliwe wekungena lokufaka ekhatsi ligama. Lobudlelwano butfwele linani le-"canonicalForm". Njengobe kuchaziwe ngenhla, kungaphindze kube naletinye tinhlobo letimelelwe etindzaweni letehlukene kuloluhlelo. Yonkhe i-Lexical Entry iphindze ikhombe ku-Sense futsi le-Senses tonkhe tihlanganisiwe, tineluphawu lwe-"translationOf". Ngako-ke, kuhumusha kwenteka emkhatsini weMitsambo. Letinhlavu titawukhombisa i-Ontology Entry lefanako njengobe kuchaziwe ngenhla kepha ayibonakali esithombeni.


Proposals

Kunetiphakamiso letintsatfu letinkhulu letentiwe macondzana nekwakhiwa kabusha/kwandzisa i-Wikidata kute kusekelwe i-Wiktionary.

Initial Proposal

initial proposal yentiwa ngu Denny Vrandečić futsi yamenyetelwa kwekucala ngaJuni 19, 2013. Kusekelwe ekwetfulweni kwetinhlobo letimbili letinsha ku-Wikidata: inkhulumo kanye nenchazelo.

While the typical Wikidata-item may have a label in every language (the English label for Q1749 is “Copenhagen”, the Danish label of the same item is “København“ etc.), regarding an expression, there would only be one label altogether. Since in this proposal, the term (word or multiword term) itself together with its linguistic information is what is of interest, it seems clear that there may not be any translated word forms in the different languages when talking about the same expression. An expression itself is dependent on the language it belongs to, the English word “Berliner” is a different expression than the German word “Berliner”. The expression “Berliner” would thus be dependent on the morphological surface form (and not have a different label – the French translation Berlinois/e or anything similar). In the latter case, there are (at least) two different meanings to the expression that are attached to it – “person from Berlin” and “doughnut with a sweet filling” – regardless of their different etymologies. Likewise, it would be the same if they required a different pronunciation, hyphenation etc. as long as the spelling is identical. As explained in the terminology, the notion of sense is introduced.

Letinchazelo letimfishane (njengekutsi "umuntfu waseBerlin") tibitwa ngekutsi ngema glosses. Ngako-ke, leligama lelitsi "Berliner (German) " linetinchazelo letimbili letehlukene letingabitwa ngekutsi "umuntfu waseBerlin" kanye "nemantongomane lanongotelako", ngekulandzelana. Leligama lelitsi "Berliner (English) " linemcondvo munye, longase ubhekiswe kuligama lelitsa "umuntfu wase Berlin". Ku-Wikidata, bekutawuba nemakhasi lamabili: "Berliner (English) " lenencenye letsi "umuntfu wase-Berlin" kanye "Berliners (German) " lencenye letsi ""umuntfu wase Berlin" kanye "nemantongomane lanongotelako".

Tintfo telulwimi titawubhaliswa njengemavi basebentisi futsi kokubili inkhulumo kanye nenchazelo ingaba nemavi. Nanobe endzabeni ye "Berliner (German) " lesitatimende lesimayelana nekuhlanganiswa kweligama litawuhlanganiswa naleligama, lesitatimendzaba lesimayelana nesibonelo sekuhlanganiswa nobe kuhumusha kutawudzingeka kutsi kuhlotjaniswe netinchazelo letifanele. Lesiphakamiso asihleli lutfo lwekufuna lwekugucuka. Onkhe emagama lavelako (bunyenti, tento letigucukako nalokunye) atawuba yinkhulumo lehlukile.

Alternative Proposal

The alternative proposal by User:Micru (David Cuenca) and User:Francis Tyers and served as a reaction to the first one and was announced on July 1, 2013. It is based on the introduction of two new entity types (defined meaning, bond) and one new data type (a paradigm).

Lomunye umehluko lomkhulu kulokucala kwehlukanisa emkhatsini wetinkhulumo kanye netinchazelo tato - kantsi esicelweni sekucala, tonkhe tinchazelo talokunye tibhalwe ndzawonye ekhasini linye, kulokunye kuphakamisa, kutawuba nelikhasi linye ngemcondvo ngamunye. Kulandzela emagama lafanako, letinkhulumo titawuncika elulwimini (lokusho kutsi: "Berliner (German) - person from Berlin" will be a different entity than "Berliners (English) - person of Berlin"). Loko lokutsiwa yinchazelo esiphakamweni sekucala kubitwa ngekutsi yinkhulumo lechaziwe kulesiphakamiso, kufana nemagama e-OmegaWiki nanobe angafani, njengoba i-OmegaViki-DM isekelwe enchazelweni lengahunyushwa, kantsi kulesiphakamelo, inkhulumo lechazako ingaba nenchazelo yayo ngelulwimi ngalunye.

The second new entity type, a bond shall replace property-links to some extent, representing certain statements as results to automatic searches, thus partly being built automatically. This will happen whenever an automatic search/inference allows it. Examples could be the automatic linking of exact translations or exact synonymy. Since there are certain difficulties associated with these kinds of inferences (semantic drifts etc.), a differentiation between strong (for example exact meronymy) and weak links (for example near-synonymy) is proposed in order to handle these phenomena better. Paradigms are language-dependent sets of rules to automatically generate derived forms. In this proposal, those shall serve as aliases to the base form of the defined expression (and optionally be stored as “inflected forms”).

Third Proposal

Fig. 7: Example entries "Hamburger", Third Proposal
Fig. 8: Example entry third proposal

The third proposal emerged predominantly out of discussions about the initial and the alternative one. It was put forward on August 2, 2013 by Denny Vrandečić. This proposal uses a slightly different terminology which is introduced below. The terms sense and gloss, however, are defined the same way as in the terminology.

  • A lexeme, also known as word or lexical entry, is what is described on one page in the lexical part of Wikidata. A lexeme consists of a lemma, a lexical category, a language, a set of forms, a set of senses, and a set of statements.
    • The lemma is the canonical form or dictionary form of the lexeme, e.g. for verbs this is usually the infinitive form, for a noun the nominative singular, etc.
    • The lexical category, also known as the part of speech or word class, defines the lexeme to be either a noun, or a verb, or an adjective, etc. The set of possible values is open and taken from the Wikidata items.
    • The language of a lexeme is taken from Wikidata items, and thus an open set.
    • A form is a specific, fully conjugated or inflexed form of the lexeme. A form consists of a representation, a set of lexical properties, and a set of statements. A form always belongs to one (and exactly one) lexeme.
      • A representation is the actual string value realizing a given form, e.g. the string value "wrote" for the past tense of the lexeme for "write". All representations are indexed for search.
      • A lexical property describes the form, e.g. tense or number for verbs, case for nouns, etc. This is an open set and points to Wikidata items.
    • A sense is described by a gloss and has a set of statements. A sense always belongs to one (and exactly one) lexeme (and lexemes belong to one language only). Senses are not independent of lexemes.
      • A gloss is a short description (translatable in all languages of the Wikidata UI) of one sense of the given lexeme.

The terms Wikidata item, property, string value, qualifier, statement, and claim are taken from the Wikidata glossary and have the same meaning here. See also the further glossary.


Letinye tingucuko letibaluleke kakhulu kuletiphakamiso letendlulile nguleti:

The ”basic” unit is the lexeme. It is not the expression as the initial proposal suggested and where each morphological form was a separate expression thus having a separate entry-page, nor the sense as the alternative proposal suggested (which may only be one part of the lexeme, in case the lexeme is polysemous/homonymous), nor the language-independent surface form as it is the case in Wiktionary.

Senses, forms and lexemes can have statements. This differs from the initial proposal to the extent that in the initial one, a separate entry for all derived forms was demanded. In the third one, inflections are ”alternative forms” that may but do not need to have statements separately from their lemma. While the alternative proposal suggested statements on sense-level (and depending on the implementation of inflections statements on either all or no inflected forms), in the third proposal, it is possible to decide where a statement is the most useful. This way, all necessary differentiations can still be drawn but shared information can be stored less redundantly.

Inflections are handled as aliases for search and do not need to have a separate entry. This is similar to the alternative proposal. However, in the third one, decisions about what may be computed automatically - for example via paradigms - are postponed to a stage where there is enough linguistic data in Wikidata for a more detailed discussion on this matter. Figure 8 shows the example entry, taken from the proposal, with more detail than the more schematic "Hamburger"-Example.

Sibonelo Se Hamburger

The "Hamburger"-example would in this case be represented as in Figure 7.

Overview

Lelishadi likhombisa kucatsanisa leminye imininingwane yaletindlela letintsatfu.

[Imibhalo lengentansi] Sicelo!! Indzawo yekungena!! Kusetjentiswa Kwemigca!! Emavi!! Kugcinwa ku-Wikidata (Wikidata) Inchazelo yekucala (initial expression; indlela ngayinye ye-morphological) ngekwehlukana kwayo ingena ku-lexeme ngayinye yeligama kanye neligama le-lexeme ngekusebentisa emagama; kugcinwa njengentfo legucukako lehlanganiswe neligama lesitsatfu (sensure) ngetinhlayiya (stroke) tonkhe tinhlayiya tayo letihlanganiswe ne-lexeme (lexeme) ngekusebentisa indlela; ingasebentisa emagama (lexeme), inchazelo (lexeme).


Further Wikidata-Terminology

The following are taken from the Wikidata glossary and are shortened at some points.

sitatimende
Sitatimende siyincenye yedatha lemayelana nalomtimba lekukhulunywa ngawo. Sitatimende sifaka intfo (njenge "Location") kanye nenombolo (sib, "Germany"), nobe lolunye budlelwane nobe inombolo lehlanganisiwe nobe lelahlekile. Sicelo singaba netintfo letifanelekile, njengetintfo letitsite letitsi lesitsi lesitsi sisebenta ngesikhatsi lesitsite. Uma kucatsaniswa netinhlavu letintsatfu letisetjentiswa kudatha lehlanganisiwe, sitatimende sisebentisa sitatimendvo sekukhombisa sitatimfwatfwa setinhlavu leticishe tibe tinhlavu letitsatfu kanye nenombolo yekukhombisa intfo yetinhlavu letihlanganisiwe. Imibono iyincenye yetitatimende letisemakhasi etintfo.
intfo
Indzaba ye-Wikidata likhasi lelisendzaweni yemagama lamakhulu e-Wikidta lemelela indzaba, umcondvo, nobe indzaba. Tintfo tibonwa nge-id, noma nge-sitelink ekhasini langaphandle, noma ngekuhlanganiswa lokwehlukile kweligama letilwimi letehlukahlukene kanye nenchazelo. Tintfo tingaphindze tibe nema-aliases kute kube melula kubuka. Incenye lenkhulu yalendzaba luhla lwetinkhulumo letiphatselene nalendzaba. Intfo ingatsatfwa njengencenye yetihloko letintsatfu ku-data lehlanganisiwe.
sitfombe
Sitfombe se-Wikidata (kuletinye tilwimi sihunyushwe ngekutsi sitfombe) sichazamagama selinani leminye imininingwane, nobe lolunye budlelwane nobe linani lelikhetsekile, kepha hhayi linani lemininingwane nobe emanani ngekwawo. Sitatimende ngasinye ekhasini le-item sihlanganisa intfo letsite, bese sinika lendzawo yinye nobe letiningi, nobe lolunye budlelwane nobe linani lelihlanganisiwe nobe lelingekho.
qualifier
I- qualifier iyincenye yesitatimende lesisho lokutsite ngalokushiwo lokutsite, kuvame kuba ngendlela lechazako. I-caualifier ingaba ligama ngekulandzela emagama latsite kepha ingaphindze ibe ligama lelichazako lelihlukile.
sitatimende
Sitatimende siyincenye yedatha lemayelana nento letsite, lebhalwe ekhasini lendzaba. Sitatimende sifaka sicelo (inombolo yetintfo letifana ne "Location: Germany", kanye netintfo letingahle tibe khona), leticondziswa ngetinkhulumo letingahle tentiwe (kuniketa umtfombo wesicelo) kanye nesikhundla lesingahle tisetjentiswe ekuhlukaniseni emkhatsini wetinkhulumo letiningi letitfwele imphahla lefanako). I-Wikidata ayenti kucabanga ngekwetsembeka kwetinkhulumo, kepha imane iqoqe iphindze ibike ngekucaphuna emtfonjeni.
string
A string (short for character string) is a general term for a sequence of freely chosen characters interpreted as text (e.g. "Hello") — as opposed to a data interpreted as a numerical value (3.14), a link to an item (e.g. Q1234) or a more complex datatype (the set {1,3,5,7}). Wikidata will in addition to a string datatype support language specific texts; "monolingual-text" and "multilingual-text" as the value of a property.