User:Lea Lacroix (WMDE)/DraftDoc

Draft for Lexicographical data/Documentation page.

Template work in progress?

This is the main documentation page for lexicographical data on Wikidata. Since the new data system is not deployed yet, this documentation is incomplete and mostly based on the test system.

See also the technical documentation on extension WikibaseLexeme.

Introduction edit

Data Model edit

 
visualization of the Lexeme data model

The data model of WikibaseLexeme describes the structure of the data that is handled as "Lexemes" in Wikibase. Here is a summary, for more detailed information, see mw:Extension:WikibaseLexeme/Data Model.

A Lexeme is a lexical element of a language, such as a word, a phrase, or a prefix (see Lexeme on Wikipedia). Lexemes are Entities in the sense of the Wikibase data model. A Lexeme is described using the following information:

  • An ID. Lexemes have IDs starting with an "L" followed by a natural number in decimal notation, e.g. L3746552. These IDs are unique within the repository that manages the Lexeme. The ID can be combined with a repository's concept base URI to form a unique URI for the Lexeme.
  • A Lemma for use as a human readable representation of the lexeme, e.g. "run".
  • The Language to which the lexeme belongs. This is a reference to a concrete Item, e.g. Q1860 for English.
  • The Lexical category to which the lexeme belongs. This is given as a reference to a concrete Item, e.g. Q34698 for adjective.
  • A list of Statements to describe properties of the lexeme that are not specific to a Form or Sense (e.g. derived from or grammatical gender or syntactic function)
  • A list of Forms, typically one for each relevant combination of grammatical features, such as 2nd person / singular / past tense. A Form is described using the following information:
    • An ID. Forms have IDs starting with the ID of the Lexeme they belong to, followed by a hyphen ("-") and an "F", followed by a natural number in decimal notation: e.g. L3746552-F7
    • A representation, spelling out the Form as a string.
    • A list of grammatical features that define for which syntactic role the given form applies. These are given as references to a concrete Items, e.g. Q814722 for participle.
    • A list of Statements further describing the Form or its relations to other Forms or Items (e.g. pronunciation audio, rhymes with, used until, used in region)
  • A list of Senses, describing the different meanings of the lexeme (e.g. "financial institution" and "edge of a body of water" for the English noun bank). A sense is described using the following information:
    • An ID. Senses have IDs starting with the ID of the Lexeme they belong to, followed by a hyphen ("-") and an "S", followed by a natural number in decimal notation: e.g. L3746552-S4. These IDs are unique within the repository that manages the Lexeme. The ID can be combined with a repository's concept base URI to form a unique URI for the Sense.
    • A Gloss, defining the meaning of the Sense using natural language.
    • A list of Statements further describing the Sense and its relations to Senses and Items (e.g. translation, synonym, antonym, connotation, register, denotes, evokes).

Interface edit

Tbd: include screenshot of the Lexeme interface.

Lexeme edit

Create a new Lexeme
Edit a Lexeme
Add information in a Lexeme
Delete information in a Lexeme
Delete a Lexeme

Form edit

Create a new Form
Edit a Form
Add information in a Form
Delete information in a Form
Delete a Form

Features edit

What is included in the first version edit

  • Add, edit, delete Lexemes
  • Add, edit, delete Forms
  • Add, edit, delete statements
  • Add, edit, delete qualifiers
  • Add, edit, delete references
  • Search for content in the search field and value field
  • Linking to a Lexeme or a Form from an Item
  • Basic internal APIs

What will be added in the future edit

Ordered from near to long-term plans

  • Add, edit, delete Senses (Senses will not be included in the first version)
  • RDF support and ability to query the data on query.wikidata.org
  • Better API support
  • Automatic generation of Forms
  • Data access on clients (other Wikimedia projects)
  • Editing data directly from Wiktionary