Wikidata:WikiProject Languages/Data model
This is a draft description of how items for languages are modelled in Wikidata.
Labels edit
Labels should not include words like "language" when it is used for disambiguation. Labels in Wikidata do not have to be unique and are not expected to be identical to the linked Wikipedia article names.
More information: Help:Label
Examples:
- German (Q188)
- In English, the label is "German" because language names do not normally include "language". People say things like "They're learning German", and not "They're learning German language". The English Wikipedia page is "German language" but this is necessary because page names have to be unique and "German" is a disambiguation page.
- In German, the label is "Deutsch" (German) for the same reason as English. The German Wikipedia page is "Deutsche Sprache" (German language) and "Deutsch" is a disambiguation page.
- In Japanese, the label is "ドイツ語" (literally: Germany language), because language names are typically formed by adding "語" (language) to a location. This is not disambiguation because without "語", it would mean something different.
- American Sign Language (Q14759)
- In English, the label is "American Sign Language" because "Sign Language" is normally part of the name of a sign language. People typically say things like "They're learning American Sign Language", and not "They're learning American Sign" or "They're learning American". Since it's part of the name, the words are capitalised.
Descriptions edit
Descriptions should help people identify the item and distinguish it from other similarly named ones. For languages, descriptions typically include the language family and where it is spoken.
Although languages are often closely connected to particular ethnic groups, it is generally not useful to include ethnic groups in the description. The language and the ethnic group often share the same name, and people are not likely to be familiar with the language but not the ethnic group, or vice versa.
Examples:
Basic statements edit
instance of (P31) edit
subclass of (P279) edit
subclass of (P279) is used to indicate the next level up in a language family tree.
country (P17) edit
country (P17) is used to indicate the countries where a language is spoken. It does not imply any official status in that country.
indigenous to (P2341) edit
indigenous to (P2341) is used to indicate the ethnic groups and the locations that a language is indigenous to.
ethnic group (P172), location (P276) and located in the administrative territorial entity (P131) are not used for this information.
native label (P1705) edit
native label (P1705) is used to store the native label (autonym) of the language. It is also sometimes used to add the name in other languages used by the speakers of the language (such as an official language of the country where it is spoken).
If the language isn't available in the list of languages, select mis
and add language of work or name (P407) as a qualifier.
writing system (P282) edit
number of speakers, writers, or signers (P1098) edit
topic's main category (P910) edit
To link to the category related to the language. Such categories comes often from Wiktionary projects, especially for minor languages.
Grammatical and phonetic behaviour edit
linguistic typology (P4132) edit
has grammatical case (P2989) edit
has tense (P3103) edit
has grammatical mood (P3161) edit
has grammatical gender (P5109) edit
has grammatical person (P5110) edit
has conjugation class (P5206) edit
has paradigm class (P5913) edit
has phoneme (P2587) edit
uses capitalization for (P6106) edit
External identifiers edit
- ISO 639-1 code (P218)
- ISO 639-2 code (P219)
- ISO 639-3 code (P220)
- ISO 639-5 code (P1798)
- ISO 639-6 code (P221)
- IETF language tag (P305)
- Ethnologue.com language code (P1627)
- Linguist List code (P1232)
- Glottolog code (P1394)
- Linguasphere code (P1396)
- WALS lect code (P1466)
- WALS genus code (P1467)
- WALS family code (P1468)
- endangeredlanguages.com ID (P2192)
- UNESCO Atlas of the World's Languages in Danger ID (P2355)
Regional edit
- ABS ASCL 2011 code (P1251) (Australia)
- AUSTLANG code (P1252) (Australia)
- Guthrie code (P2161) (Africa)
- Statistics Indonesia language code (P2590) (Indonesia)
- Newguineaworld ID (P11696) (Papua New Guinea)
described at URL (P973) edit
Other databases which don't yet have an external identifier property can be linked using described at URL (P973), e.g.
- Asian and African Sign Languages (AASL)
- Numeral Systems of the World's Languages
- Austronesian Basic Vocabulary Database (ABVD)
- Turkic Database at Elegant Lexicon
- ScriptSource
CLLD databases edit
- Austronesian Comparative Dictionary (ACD)
- AfBo: A world-wide survey of affix borrowing
- Atlas of Pidgin and Creole Language Structures (APiCS)
- Automated Similarity Judgment Program (ASJP)
- electronic World Atlas of Varieties of English (eWAVE)
- Grambank
- PHOIBLE
- Uralic Areal Typology Online (UraTyp)