Wikidata:Lexicographical data/Welcome/Indian languages

Welcome Indian linguaphiles!

The lexicographical data project of Wikidata is an exciting volunteer project to store word data of all languages in a structured database. Anyone can add words in their language with ease once they are familiar with the project and its conventions; technical skills are not necessary at all. So, join-in and edit!

The goal of this project is to store all words of all languages in a translingual structured machine-readable database under a public domain (CC0) license.

The lexemes namespace of Wikidata is a place to store words: all words and phrases of all languages under a structured machine-readable data model that can be read by both humans and machines alike, released under a public domain license that any one can use without any restriction.

Why important? edit

  1. Provides lexical support for different linguistic projects such as Wiktionary (more than 182 versions), automatic translations (for Abstract Wikipedia, etc.), and other projects
  2. Digitizes words in a structured machine-readable model: this project aims to digitize all the words and the meanings that they convey. Each individual lexical unit gets an entry and will have its different forms listed.
  3. Word documentation: Over half of the languages of the world are dying at an accelerated rate. Digitizing and documenting it preserves and helps revitalizing the dying languages.
  4. Helps automatic translations for Abstract Wikipedia project in all languages using codes of Wikifunctions project.
    • Provides translations for vulnerable languages; helping to revive them
    • Improves the existing machine translations available for major languages
  5. Translingual and language-independent: The lexemes are stored in a language independent, translingual manner using L-IDs, Q-IDs, etc., that will have labels in all languages. The lexicographical data project strives to store lexical words in each language's own way and avoid depending on any major language.
  6. Support language tools: As the words are entered in a free machine-readable model, several language tools can be created using it. Examples: text-to-lexemes and language detection tools

What are lexemes? edit

Lexeme is the abstract "word" underlying a set of inflections. It refers to the set of all the forms that have the same meaning. For example, in English, run, runs, ran and running are forms of the same lexeme "run". Examples of grammatical forms include those denoting tenses, plural, cases (e.g., possessive case), and other inflections.

e.g.,: The lexeme "run includes as members "run" (lemma), "running" (inflected form), and "ran", but excludes "runner" (a derived term). (from enwikt).

The headword used to represent a lexeme is called a "lemma" (also known as the dictionary form).

Wikidata Lexemes edit

The lexeme: namespace is used to store word data such as words, phrases and sentences. In contrast, the mainspace of Wikidata stores data relating to conceptual or material entities. Nearly 600,000 Lexeme entries have been created so far, each corresponding to different lexical units of various languages.

This structured word data is released under a CC0 Public Domain Dedication license which means anyone on the planet can use it for any purpose without any restriction.

Each lexeme entry in Wikidata includes all the different inflected forms, definitions, lexical category, transliteration, synonyms, translations, pronunciation audio and IPA, links to Wikidata items, etc., in a structured machine-readable format. (more info)

Indian languages edit

Documentations for major Indian languages have been made available at Wikidata:Lexicographical data/Documentation/Languages

How to help? edit

  • You can help by adding words of your language to this project using Special:NewLexeme page. Any native speaker can contribute their lexical words to this project. You just have to familiarize with the project, its jargons, data models and conventions here. Technical skills are not required, any one can contribute with ease once they are familiar.
  • Add pronunciation audios of words of all Indian languages using Lingua Libre or Spell4Wiki.
  • You can also help by translating documentation pages of this project to your languages and invite other members.

Statistics edit

Main page: Wikidata:Lexicographical data/Indian languages/Statistics
  • Count of lexemes, forms and senses for Indian languages: query result
  • Count of usage examples and sense images for Indian languages: query result
  • Count of forms with pronunciation audio or IPA: query result

Contact edit