Wikidata:WikiProject Languages/Data model

This is a draft description of how items for languages are modelled in Wikidata.

Labels edit

Labels should not include words like "language" when it is used for disambiguation. Labels in Wikidata do not have to be unique and are not expected to be identical to the linked Wikipedia article names.

More information: Help:Label

Examples:

German (Q188)
In English, the label is "German" because language names do not normally include "language". People say things like "They're learning German", and not "They're learning German language". The English Wikipedia page is "German language" but this is necessary because page names have to be unique and "German" is a disambiguation page.
In German, the label is "Deutsch" (German) for the same reason as English. The German Wikipedia page is "Deutsche Sprache" (German language) and "Deutsch" is a disambiguation page.
In Japanese, the label is "ドイツ語" (literally: Germany language), because language names are typically formed by adding "語" (language) to a location. This is not disambiguation because without "語", it would mean something different.
American Sign Language (Q14759)
In English, the label is "American Sign Language" because "Sign Language" is normally part of the name of a sign language. People typically say things like "They're learning American Sign Language", and not "They're learning American Sign" or "They're learning American". Since it's part of the name, the words are capitalised.

Descriptions edit

Descriptions should help people identify the item and distinguish it from other similarly named ones. For languages, descriptions typically include the language family and where it is spoken.

Although languages are often closely connected to particular ethnic groups, it is generally not useful to include ethnic groups in the description. The language and the ethnic group often share the same name, and people are not likely to be familiar with the language but not the ethnic group, or vice versa.

Examples:

  • Koro (Sino-Tibetan language spoken in India)
  • Koro (Oceanic language spoken in Papua New Guinea)
  • Koro (Oceanic language spoken in Vanuatu)
  • Koro (Mande language spoken in Ivory Coast)

Basic statements edit

instance of (P31) edit

subclass of (P279) edit

subclass of (P279) is used to indicate the next level up in a language family tree.

country (P17) edit

country (P17) is used to indicate the countries where a language is spoken. It does not imply any official status in that country.

indigenous to (P2341) edit

indigenous to (P2341) is used to indicate the ethnic groups and the locations that a language is indigenous to.

ethnic group (P172), location (P276) and located in the administrative territorial entity (P131) are not used for this information.

native label (P1705) edit

native label (P1705) is used to store the native label (autonym) of the language. It is also sometimes used to add the name in other languages used by the speakers of the language (such as an official language of the country where it is spoken).

If the language isn't available in the list of languages, select mis and add language of work or name (P407) as a qualifier.

writing system (P282) edit

number of speakers, writers, or signers (P1098) edit

topic's main category (P910) edit

To link to the category related to the language. Such categories comes often from Wiktionary projects, especially for minor languages.

Grammatical and phonetic behaviour edit

linguistic typology (P4132) edit

has grammatical case (P2989) edit

has tense (P3103) edit

has grammatical mood (P3161) edit

has grammatical gender (P5109) edit

has grammatical person (P5110) edit

has conjugation class (P5206) edit

has paradigm class (P5913) edit

has phoneme (P2587) edit

uses capitalization for (P6106) edit

External identifiers edit

Regional edit

described at URL (P973) edit

Other databases which don't yet have an external identifier property can be linked using described at URL (P973), e.g.

CLLD databases edit

Constructed languages edit