Wikidata:Lexicographical data/Documentation/Languages/hi
Hindustani (Q11051) or Hindi-Urdu is a language spoken in India and Pakistan. This page is a documentation page for Hindustani (Hindu-Urdu) language under WikiProject Wikidata:Lexicographical data, intended for coordinating contributions to Hindustani (Hindi and Urdu) lexeme content and related discussions. WikiProject India is a related Wikiproject that covers all Hindustani topics.
Example Hindustani lexeme entries:
Wikidata:Lexemes aims to provide a CC0 licensed structured lexicographical data for everyone to use for different purposes, including for Wiktionary and the upcoming Abstract Wikipedia.
Layout
editEvery lexeme entry has the following layout:
Lexeme-level
editThe lemma of the lexeme can be considered a title or headword, generally the dictionary form of the word. It is to be written in both hi (Hindi, Devanagari script) and ur (Urdu, Arabic script) spelling variants for the Hindustani language entries. See उठना/اُٹھنا (L1071943) for example.
Every lexeme entry will have a lexeme ID (beginning with "L").
The language of the lexeme should be Hindustani (Q11051) in all cases (that is, not Hindi (Q1568) and not Urdu (Q1617)).
The lexical category should also be specified as broad as possible, and based on the Hindustani linguistic ontology.
Senses
editSenses represent different meanings of the same word.
Some statements that may be added to senses include image, item for this sense, translation, synonym, antonym, usage example, and more (see list). Note that for the translation, antonym, & synonym properties, the lexeme "sense ID" (LXXXXX-S1
) of the target lexeme has to be copy pasted, not the lexeme ID.
Forms
editForms represent different inflections (cases for nouns/adjectives, conjugations for verbs) of the lexeme (in both Hindi and Urdu spelling variants).
Each noun typically has four forms, for each combination of number (singular (Q110786)/plural (Q146786)) and case (direct case (Q1751855)/oblique case (Q1233197)). A small number of nouns which are often but not always animate also have vocative inflections (vocative case (Q185077)). These are governed by the senses on the lexeme and should not be added without certainty that they are used.
Structure and properties
editCommon properties to be added for lexeme entries are given below:
Statements
edit- grammatical gender (P5185): (masculine (Q499327) / feminine (Q1775415))
- derived from lexeme (P5191)
- usage example (P5831)
- homograph lexeme (P5402)
- combines lexemes (P5238)
Identifiers
edit- Urdu Lughat ID (P11350) – aggregate online dictionary maintained by the Urdu Dictionary Board, a Karachi-based Pakistani government operation
- Provided below is a key to some of the part of speech abbreviations used in the headings of entries. A key to those used in the footer for etymologies may be found in the menu on the Advanced Search page.
- صف = صفت
- امث = اسمِ مؤنث
- امذ = اسمِ مذکر
- ف ل = فعل لازمی
- ف م = فعل متعدی
- م ف = متعلق فعل
Senses
edit- item for this sense (P5137)
- image (P18)
- translation (P5972)
- synonym (P5973)
- antonym (P5974)
- hyperonym (P6593)
- gloss quote (P8394)
Forms
edit- Grammatical features
- Grammatical gender: masculine (Q499327) / feminine (Q1775415)
- Grammatical number: singular (Q110786) / plural (Q146786)
- pronunciation audio (P443)
- IPA transcription (P898)
Spelling
editBelow are some guidelines for resolving some irregularities in spellings between the two writing systems, particularly for words which may be poorly attested in one register or the other.
- ष — in Sanskritized words borrowed via Bengali this is ش, otherwise it is کھ. Most words spelled with this letter post-partition are Bengali borrowings.
- ज्ञ — in practice always گی. Some Urdu dictionaries contain spellings with نج under the assumption this cluster represents an independent sound in Hindi, but this does not reflect actual usage.
- ऋ — is always رِ.
- ण — is always ن.
- पुर — word-finally, this is پور rather than پُر.
- ऑ — this vowel is purely decorative and is best ignored even in Devanagari spellings. Most of its use is confined to distinguishing the abbreviation डाॅ॰ “Dr.”.
- आँड़ — this sequence of a nasal and flap is typically written as نڈ in Urdu dictionaries and it is acceptable to pair these spellings together as the consonants represented by ڑ and ڈ are allophones in native Hindustani words. In English loanwords, only ڈ is realized in all positions, and in vocabulary loaned from Punjabi the positions of ڈ and ڑ is maintained in Urdu spellings as these sounds are not allophones in Punjabi.
- त् — words spelled with this ending in standard Hindi are borrowings from Bengali words ending in ৎ. Although the virama/halant is retained when not followed by a suffix, it is removed in the oblique plural as in तों rather than त्ओं.
- आँव — although انو may be found for this sequence in older Urdu writing, this is now more commonly written as اؤں.
- य — word-finally, spelled with یہ in borrowings from Bengali, otherwise spelled with ے.
- ژ — the value of this letter is always simply ज.
- ہ — the use of this letter word finally is often arbitrary and unetymological. The word commonly spelled پتہ in Urdu is from Punjabi پتا rather than a Persian *پته. If both variants with this letter and ا exist they do not need separate lexemes. The lemma can follow whichever spelling is treated as the primary one in Urdu Lughat.
- ق — some of the words spelled with this letter are native words which have been given pseudo-Arabic spellings, such as قُلی. The nukta form क़ is not necessary to represent this consonant which already had an ambiguous status in Persian. The /q/ phoneme represented by ق does not have phonemic status in Pashto either, and the spelling in Pashto onomatopoeic formations used in Hindustani like تڑق is an emphatic affect.
Maintenance
edit- Recent Changes to Hindustani Lexemes
- Search lexemes:
To do
edit- Add the most frequent missing forms of Hindustani language in Wikidata LD.
Lexicographical Coverage
edit- See also: WD:Lexicographical data/Statistics
- The lexeme forms coverage chart for Hindustani language is given below.
|
|
|
Queries
edit- Main page: WD:Lexicographical data/Ideas of queries
- Hindustani Q-id:
Q11051
1) Get all existing lexemes in Hindustani: query result
The following query uses these:
- Items: Hindustani (Q11051)
SELECT ?lexeme ?lemma WHERE { ?lexeme dct:language wd:Q11051; wikibase:lemma ?lemma. }
2) Get the count of lexemes in Hindustani belonging to different lexical categories: https://w.wiki/3$cf
3) Query for all Hindi/Urdu nouns missing a direct case: query
The following query uses these:
- Items: Hindustani (Q11051) , noun (Q1084) , direct case (Q1751855)
SELECT DISTINCT ?l ?lemma WHERE { ?l a ontolex:LexicalEntry ; dct:language wd:Q11051; wikibase:lexicalCategory wd:Q1084; wikibase:lemma ?lemma ; ontolex:lexicalForm ?form . ?form ontolex:representation ?word ; minus { {?l a ontolex:LexicalEntry ; ontolex:lexicalForm/wikibase:grammaticalFeature wd:Q1751855.} }. }
Resources
editSome resources, in addition to the ones listed below, may be found at Commons:Category:Books about the Hindustani language.
Dictionaries
editQuotable dictionaries
editPublic domain dictionaries may be quoted using gloss quote (P8394), referenced with the claims stated in (P248) (appropriate dictionary item), page(s) (P304) (appropriate page number), and reference URL (P854) if applicable.
Public domain monolingual dictionaries (preferred):
- Najm ul-Lughat (Q116771763)
- Nur ul-Lughat (Q116742594)
- Tuhfat-ul-Hind (Q116733917)
- Khaliq-e-Bari (Q117029031)
Public domain bilingual dictionaries (those in other regional languages preferred):
- Urdu-Punjabi-Hindi dictionary (Q116459885)
- Kangri Shabd Sangraha (Q116222955)
- Masdar-e Fuyuz (Q117189077)
- Hindi Punjabi Kosh (Q117189099)
- A Dictionary of Urdu, Classical Hindi, and English (Q108916279)
- Brice's Romanized Hindústánî and English dictionary
- Fallon's new Hindustani-English dictionary (searchable at https://dsal.uchicago.edu/dictionaries/fallon/)
- Forbes's dictionary, Hindustani and English[1]
- Shakespear's dictionary, Hindūstānī and English[1] (reprinted in Lahore in 1980 as Dictionary, Urdu-English and English-Urdu; searchable at https://dsal.uchicago.edu/dictionaries/shakespear/)
- Yates' dictionary, Hindustání and English
- Q84072461
More may be found here.
Citable, but non-quotable, dictionaries
editOther dictionaries that may be cited but not quoted include the following (those glossed in regional languages are likewise preferrable):
- Hindko Urdu Lughat (Q115437685)
- Sindhi-Urdu lughat (Q116740442)
- Burushaski-Urdu Lughat (Q115929776)
- Urdu Punjabi Lughat (Q65398900)
- Pehli Waddi Saraiki Lughat (Q113960284)
- Bahri's Learners' Hindi-English dictionary
- Caturvedi's practical Hindi-English dictionary
- Qureshi's Kitabistan's 20th century standard dictionary
- Hindi-Chinese Kosh (Q113530710)
- Tulsi Shabdsagar ([1] [2])
- Manak Hindi Kosh (vol. 1, vol. 2, vol. 3, vol. 4, vol. 5)
- Lughat-e Firozi (part)
- Brajbhasha Sur-kosh (vol. 1, vol. 2)
Phrases
edit- A dictionary of Hindustani proverbs (Q110625160)
- Bholanath Trivedi's वृहत हिन्दी लोकोक्ति कोश
- Siyanriyan nan Khanrae (Q116659928)
- Khazina-e muhawarat; or, Urdu idioms (Q116771776)
- Psalms and Proverbs in Kaithi Hindi (Q117037597)
Grammars
edit- Urdu Qaid (Q116771815)
- John Dowson's Grammar of the Urdū or Hindūstānī Language
- Edwin Greaves' Hindi Grammar
- Samuel Henry Kellogg and T. Grahame Bailey's Grammar of the Hindi language
- George Small's Grammar of the Urdū Or Hindūstānī Language in Its Romanized Character
Orthography
edit- Urdu imla (Q115780460) – a comprehensive public domain work explicating the history of Urdu orthography
Regional Context
editTools
editContact
edit- Wikidata talk:WikiProject India can provide help with Hindi and Urdu related questions
- User:Vis M