Wikidata:Dati lessicografici/Documentazione

This page is a translated version of the page Wikidata:Lexicographical data/Documentation and the translation is 29% complete.

Questa è la pagina principale di documentazione per i dati lessicografici su Wikidata. È pensata per descrivere informazioni generali sui lessemi di Wikidata: il modo in cui sono strutturati, come si possono modificare e cosa si può aggiungere per arricchirli.

Note that while the information on this page may be broadly applicable across most languages, what works for modeling one language will not always work for modeling another language. For information about modeling lexemes for specific languages, visit the documentation pages for them.

Ulteriore documentazione tecnica può essere trovata separatamente per l'estensione WikibaseLexeme per MediaWiki, che fornisce il supporto per i lessemi su Wikidata.

A Glossary of Wikidata Lexicographical terms is available.

Modello Dati

 
Visualizzazione del modello dati dei lessemi

Il modello di dati di WikibaseLexeme descrive la struttura dei dati gestiti come "lessemi" in Wikidata. Il testo seguente è un semplice riassunto; per informazioni più dettagliate, vedi la corrispondente pagina di documentazione su WikibaseLexeme.

Un lessema è un elemento lessicale di una lingua, come una parola, una frase o un prefisso. (Ulteriori informazioni sui lessemi in genere possono essere trovate su Lessema su Wikipedia.)

Ogni lessema è costituito da sette componenti, descritte in ciascuna delle seguenti sottosezioni:

  1. il suo identificativo;
  2. i suoi lemmi;
  3. la sua lingua;
  4. la sua categoria lessicale;
  5. le sue dichiarazioni (di primo livello);
  6. i suoi sensi; e
  7. le sue forme.

Id Lessema

Lexemes have identifiers starting with an "L" followed by a number using the digits 0-9, such as L3746552. These IDs (often called "LIDs", for "lexeme identifiers") are unique within Wikidata and are assigned automatically when a lexeme is created.

The RDF URI for a lexeme is http://www.wikidata.org/entity/ followed by the lexeme ID.

Lexeme lemmata

The lemmata (singular lemma) of a lexeme are primarily used as human-readable representations of the lexeme. Each lemma consists of a string accompanied by a valid IETF language tag. Usually lemmata are the written forms of a word, phrase, or affix that would be found in a dictionary describing them, whether or not they are considered the 'base' or 'stem' forms morphologically.

  • e.g. the English lexeme Lexeme:L3435 has the lemma 'umbrella' because most English dictionaries provide information about this lexeme under the heading 'umbrella' and not under something like 'umbrellas' or "umbrella's" or "umbrellas'".
  • e.g. the Italian lexeme Lexeme:L1196965 has the lemma 'volare' because most Italian dictionaries provide information about it under that heading and not under something like 'volo', 'volante', or 'volato'.
  • e.g. the Korean lexeme Lexeme:L17 has the lemma '먹다' because most Korean dictionaries provide information about it under that form, rather than something like '먹-', '먹어', or even '먹습니다'.

Lexemes can have several lemmata, particularly when there are differences in the writing system or other orthographic conventions within a given language. Different lemmata are indicated with different language tags, and a lexeme may only have one lemma for a given language tag.

  • e.g. the Hindustani lexeme Lexeme:L641622 has two lemmata, 'चाचा' with code hi and 'چاچا' with code ur, which are representations of the same dictionary form (pronounced /t͡ʃɑː.t͡ʃɑː/) in the Devanagari script (used for Hindi) and the Arabic script (used for Urdu).
  • e.g. the Hebrew lexeme Lexeme:L63672 has two lemmata, 'אדום' with code he and 'אָדֹם' with code he-x-Q21283070, which reflect differences in how the same word form is spelt depending on whether diacritics are present.
  • e.g. the Southern Min lexeme Lexeme:L308008 has three lemmata, '城市' with code nan-hani, 'siânn-tshī' with code nan-x-Q56929, and 'siâⁿ-chhī' with code nan-x-Q559173. These represent using either Chinese characters or one of two romanization systems, each corresponding to the same word form.

Note that some of the language codes above contain an '-x-' in them. There are two main reasons this would be present in a language code:

  1. For languages whose language codes are not yet supported, a last-resort option for a language code to use would involve adding a private-use subtag, containing the QID for the Wikidata item for the language, with the mis base code.
  2. If a language has a supported language code, but a variation whose language code is not supported, the private-use subtag may be attached directly to the existing supported code.
    • e.g. lexemes in the Varendri (Q48726757) of Bengali, such as Lexeme:L672268, have a lemma with the code bn-x-Q48726757 (where 'bn' is the existing supported code).
    • e.g. lemmata in Devanagari Sindhi (Q116688933) for lexemes in Sindhi use the language code sd-x-q116688933 (where 'sd' is the existing supported code).
    • e.g. lemmata in the Adlam (Q19606346) for lexemes in Fula use the language code ff-x-q19606346 (where 'ff' is the existing supported code).

Lexeme lemmata are what are displayed when using the {{L}} template to link to a lexeme on Wikidata (including later on this page).

Lingua del lessema

The language to which a lexeme belongs is a reference to a Wikidata item for a language.

For most languages, this is a straightforward determination: English (Q1860), Thai (Q9217), Manchu (Q33638), and Gun (Q3111668) are just four of the many possibilities, since they have supported language codes en, th, mnc, and guw.

Some languages, however, have begun to require for their lexemes that particular language items be used; see the documentation pages for those languages for more information.

Categoria lessicale

The lexical category to which a lexeme belongs is a reference to a Wikidata item for a particular group of words with specific syntactic behavior in a language. This usually corresponds with the "part of speech" of the lexeme: nouns, verbs, adjectives, adverbs, and so on.

Different languages may necessarily use different lexical categories, but some are frequent enough across languages that a comparison may be made. The following table, when expanded, provides examples of lexemes in each language falling into some of the most common lexical categories across Wikidata lexemes.

Sample lexemes by language and lexical category
verb noun pronoun adjective adverb preposition postposition conjunction interjection numeral determiner grammatical particle
Arabic ذهب (L7882) كِتاب (L2233) أنا (L7883) جميل (L7884) عادَة (L7885) فِي (L2452) لَكِنَّ (L7886)) يَعْنِي (L7887) واحِد (L7891) هذا (L7892)
English go (L3006) book (L536) I (L487) beautiful (L3360) usually (L4114) in (L2987) ago (L3240) but (L1387) oh (L4327) one (L327) this (L2994)
German wissen (L2058) Zukunft (L80) ich (L7877) ausgezeichnet (L530) querbeet (L7059) in (L6748) aber (L7879) ach (L7889) eins (L7880) dieser (L7881)
Korean 먹다 (L17) 사람 (L130) (L246) 괴롭다 (L100) 함께 (L168) 가만 (L86) / (L83) 고전적/古典的 (L49)
Spanish ir (L7385) libro (L317) yo (L55951) hermoso (L55952) normalmente (L55953) en (L11741) N/A pero (L55954) oh (L692468) uno (L44969) esto (L55955)
French aller (L750) livre (L6873) je (L9094) beau (L7026) toujours (L9105) dans (L9148) mais (L9261) merci (L11618) un (L9167) ce (L9203)
New Persian رفتن/рафтан/raftan (L2921) کتاب/китоб (L226813) من/ман (L2377) زیبا/зебо (L238420) معمولاً/маъмулан (L749792) در/дар (L230487) اما/аммо (L678620) آخ (L749794) یک/як (L303349) این/ин (L742781)
Russian быть (L2111) вода (L189) я (L2027) хороший (L10951) хорошо (L10948) в/въ (L2109) N/A и (L2108) всё (L2115) три (L32930) N/A не (L2110)
Swedish göra (L38963) boll (L32310) han (L35645) listig (L39404) ofta (L35726) (L35650) - och (L35648) hej (L246342) fem (L46944) den (L47066) ju (L53540)
Punjabi ਸਕਣ/سکݨ (L689075) ਡੱਡੂ/ڈڈّو (L678986) ਉਹ/اوہ (L686605) ਕਾਲਾ/کالا (L684186) ਨਹੀਂ/نہیں (L686542) - ਵਿਚ/وِچ (L679728) ਕਿਉਂਕਿ/کیوں کہ (L686369) ਆਹੋ/آہو (L689404) - ਇਕ/اِک (L686328) ਤਾਂ/تاں (L686341)
Italian amare (L5137) casco (L580895) io (L21271) bizzarro (L1199728) amichevolmente (L1155269) con (L7405) N/A o (L2779) ciao (L313550) otto (L5161)

Dichiarazioni sul lessema

Lexemes, like items or properties, have statements (claims) that provide information about the lexeme that is not specific to one of its forms or senses. Depending on how a particular language works, and depending on the lexical category of the lexeme, some statements will be more applicable to a given lexeme than others.

Many common properties applicable directly to lexemes are listed in Template:Lexicographical properties.

Sensi del lessema

Senses describe the different meanings of a lexeme.

A sense consists of three parts: 1) the sense ID, 2) glosses, and 3) statements.

  1. The sense ID starts with the ID of the lexeme it belongs to, followed by a hyphen ("-") and an "S", followed by a natural number in decimal notation: e.g. L3746552-S4. These IDs are unique within Wikidata; when a new sense is created within a lexeme, an entirely new sense ID is provided for it. Like an LID, a sense ID may be appended to http://www.wikidata.org/entity/ to form a unique URI for the sense.
  2. Glosses define the meaning of the sense using natural language. For a lexeme in a given language X, the gloss in language X should be a more detailed explanation of the meaning of the sense, while the glosses in other languages Y and Z may be less detailed, so long as they are clear enough to speakers of Y and Z what the meaning of the sense is.
  3. Like lexemes, items, and properties, senses can have statements further describing the sense and its relations to other senses and to Wikidata items.

Many common properties applicable to lexeme senses are listed in Template:Lexicographical properties.

Forme del lessema

Forms describe the different realizations of a lexeme in speech or writing.

Depending on how a language behaves morphologically, there may be exactly one form of a lexeme or there may be multiple forms. In general, the more isolating or analytic or the more agglutinative or polysynthetic a language is, the more it may benefit from having one form per lexeme. Lexemes in many fusional languages typically have multiple forms for particular combinations of grammatical features.

A form consists of four parts: 1) the form ID, 2) form representations, 3) grammatical features, and 4) statements.

  1. The form ID starts with the ID of the lexeme it belongs to, followed by a hyphen ("-") and an "F", followed by a natural number in decimal notation: e.g. L3746552-F4. These IDs are unique within Wikidata; when a new form is created within a lexeme, an entirely new form ID is provided for it. Like an LID or a sense ID, a form ID may be appended to http://www.wikidata.org/entity/ to form a unique URI for the form.
  2. Form representations are strings, accompanied with language tags, that signify how a particular form is used. As with lemmata, there may be multiple representations on a single form to handle differences in writing system or orthographic variation within a language.
  3. Grammatical features are references to Wikidata items that define the syntactic circumstances in which a given form applies.
  4. Like lexemes, senses, items, and properties, forms can have statements further describing the form and its relations to other forms and to Wikidata items.

Many common properties applicable to lexeme forms are listed in Template:Lexicographical properties.

Lexeme inclusion criteria

In some cases or languages, there may be multiple entities for related words, whereas in other language there may be just one. The below table provides an overview of how nouns in particular may be linked:

One or several lexemes for nouns?
difference in1 lexeme2+ lexemes
senseadd several sensesadd applicable sense to lexemelink other(s) with homograph lexemeduplicate forms on each
etym.add etym. to each senseadd etym. to lexeme baselink other(s) with homograph lexemeduplicate forms on each
genderadd gender to each senseadd gender to lexeme baselink other(s) with homograph lexemeduplicate forms on each
common/properadd several sensesuse lexical category "noun"add applicable sense to lexemelink other(s) with homograph lexemeduplicate forms on each
caps/lowercaseadd several formsqualify forms to applicable sensesadd applicable sense to lexemelink other(s) with homograph lexemeadd only applicable forms
singular/pluraladd several formsqualify forms to applicable sensesadd applicable senseif possible link other(s) with homograph lexemeadd only applicable forms
pronunciationadd the same form twicequalify forms to applicable senses, add prononciationadd applicable senseif possible link other(s) with homograph lexemeadd form and applicable pronunciation
forms/spellingadd several forms or alternate formsqualify forms to applicable sensesadd applicable senseif possible link other(s) with homograph lexemeadd only applicable forms

For a given language and criterion (first column), just one of the two might apply

Interfaccia

The following section details steps to take in Wikidata's user interface to perform common tasks involving editing lexemes.

Lessemi

 
Screenshot della pagina di creazione di un lessema (come appariva prima di Novembre 2022)

Creare un nuovo lessema

  1. Vai a Special:NewLexeme.
  2. Under Lemma, enter a lemma (see #Lexeme lemmata for more information).
  3. Under Lexeme's language, enter the language of the lexeme, either by typing the name of the language or its QID (see #Lexeme language for more information).
    1. If you are prompted to do so, under Spelling variant of the Lemma, enter the language code of the lemma (see #Lexeme lemmata for more information).
  4. Under Lexical category, enter the lexical category of the lexeme, either by typing its name or its QID (see #Lexical category for more information).
  5. Fare clic su "Create" per salvare le modifiche.

You have now created a lexeme with the most basic information. Because it is very empty, it cannot meaningfully be used until more information is added to it, such as statements, senses, and forms (for which see later in this page).

Edit a lexeme's lemmata, language, or lexical category

 
Screenshot of the top of a Lexeme page
  1. Next to the lemmata, click the 'edit' button.
  2. Lemmata may be edited as follows:
    1. To add a lemma, first select the "+" that appears beside the lemmata.
    2. In the new lemma, under Lemma, add the representation of the new lemma.
    3. Also in the new lemma, under Spelling variant, add the language code of the new lemma.
    4. To remove a particular lemma, simply select the "x" appearing beside Lemma in that lemma.
  3. To change the language of the lexeme, use the search box appearing beside Language to pick an item for a language.
  4. To change the lexical category of the lexeme, use the search box appearing beside Lexical category to pick an item for a lexical category.
  5. Click "publish" to save your changes.

Aggiungere, modificare o cancellare le dichiarazioni di un Lessema

 
Screenshot of the interface to edit a statement

Adding a statement to a lexeme entails the following steps:

  1. Fare clic su "add statement"
  2. Enter a property, typing its name in the property field (such as derived from lexeme) and selecting it in the suggester.
  3. Enter a value for the property.
    Note: A Wikidata property for lexicographic senses (Q54275340) such as translation (P5972) or synonym (P5973) does not currently support searching for senses, either by lexeme lemmata or sense glosses. This means in order to enter a value for a statement, you need to enter the precise sense ID for the sense you want as a value.
     
    As seen here, Wikidata will not be able to find Lexemes and their senses when searching by their name.

     
    Searching by a precise Lexeme Sense ID however returns a publishable result.
  4. If you wish to add qualifiers and references to the statement, feel free to do so.
  5. Save the statement by clicking "publish".
  6. Per modificare una dichiarazione, fare clic su "edit".
  7. Per eliminare una dichiarazione, fare clic su "edit", e poi su "remove".

Cancellare un lessema

To delete a lexeme, you may request its deletion at Wikidata:Requests for deletions, just as is done with items. If you have the Merge gadget enabled, you may submit deletion requests for lexemes using it.

Cercare un Lessema

To look for lexemes via Special:Search or the search box on any page, you may use its LID, one of its lemmata, or a representation of one of its forms.

The simplest way to do this is to prefix "L:" to one of these, and you will automatically see results in the lexeme namespace for your search. For example, lexeme L301993 has the lemma "হৃদয়" and one of its forms has the representation "হৃদয়েতে". Searching for "L:L301993", "L:হৃদয়", or "L:হৃদয়েতে" will return the same lexeme in the results.

You may alternatively search without the "L:" prefix (e.g. using "L301993", "হৃদয়", or "হৃদয়েতে"), then select the "Lexeme" namespace in the Search in: and rerun the search to get the same lexeme returned.

Note that the selector (the drop-down menu that pops up to suggest results) does not support the lexeme namespace yet. Pressing Enter or clicking the search icon after typing your keyword, however, will show you the results.

Sensi

Creare un nuovo senso

  1. In the Senses section of a lexeme, click "add Sense".
  2. Under Language, enter a language code for the gloss.
  3. Under Gloss, enter the gloss.
  4. To add new glosses, click "add" and repeat steps 2 and 3.
  5. Fare clic su "publish" per salvare le modifiche.

Modificare la descrizione di un senso

  1. Next to the sense glosses, click "edit".
  2. To add a new gloss, do the following:
    1. Underneath the existing sense glosses, click the smaller "add" link. (Be careful that you do not accidentally click on the add statement or add Sense links used to add a new statement or sense instead!)
    2. Under Language, enter a language code for the new gloss.
    3. Under Gloss, enter the new gloss.
    4. Repeat these steps for each new gloss you wish to add.
  3. To remove a gloss, click "remove" next to the gloss.
  4. Fare clic su "publish" per salvare le modifiche.


Cancellare un senso

  1. Next to the sense glosses, click "edit".
  2. Fare clic su "remove"

Forme

 
aggiungere una Forma

Creare una nuova forma

  1. In the Forms section of a lexeme, click "add Form".
  2. Under Representation, fill in a representation for the new form.
  3. Under Spelling variant, fill in the language code for that representation.
  4. To add more representations, click the "+" next to the existing representations and repeat steps 2 and 3 for the new representation.
  5. Next to Grammatical features, enter one or several grammatical features, by typing their name and selecting them in the list of items that appears.
  6. Fare clic su "publish" per salvare le modifiche.

Modificare le rappresentazioni di una forma o le caratteristiche grammaticali

  1. Next to the form's representations, click "edit".
  2. Representations may be edited as follows:
    1. To add a representation, first select the "+" that appears beside the representations.
    2. In the new representation, under Representation, add the new representation for the form.
    3. Also in the new representation, under Spelling variant, add the language code for that representation.
    4. To remove a particular representation, simply select the "x" appearing beside Representation in that representation.
  3. To add a grammatical feature, type its name at the end of the text box and select the appropriate item in the list of items that appears.
  4. To remove a grammatical feature, click the "x" that appears next to it.
  5. Fare clic su "publish" per salvare le modifiche.

Cancellare una forma

  1. Next to the form's representations, click "edit".
  2. Fare clic su "remove"

Funzionalità

Vedi anche: Wikidata:Lexicographical data/Development

Cosa è incluso nella prima versione

  • New datatypes: Lexeme, Form
  • Add, edit, delete Lexemes
  • Add, edit, delete Forms
  • Add, edit, delete statements
  • Add, edit, delete qualifiers
  • Add, edit, delete references
  • Linking to an Item from a Lexeme or a Form
  • Linking to another Lexeme from a Lexeme, a Form or an Item
  • Search and suggestions when entering a value
  • Basic internal APIs (used for UI, you should not use them)

Cosa sarà aggiunto in futuro

Ordered from near to long-term plans

  • Search for content with Special:Search   Done
  • Display the lemma in the history pages, recent changes and watchlist   Done
  • Add, edit, delete Senses   Done
  • RDF support and ability to query the data on query.wikidata.org   Done
  • Better API support
  • Automatic generation of Forms
  • Data access on clients (other Wikimedia projects)   Done
  • Editing data directly from Wiktionary

Vedi anche