Wikidata:Lexicographical data/Documentation/Languages/br

Breton
natural language, modern language
Subclass ofSouthwestern Brythonic Edit
Part ofregional languages of France Edit
Native labelbrezhoneg, Brezhoneg Edit
CountryFrance Edit
Indigenous toBrittany, Brittany Edit
Start time5. century Edit
Linguistic typologyverb–subject–object, V2 word order, nominative–accusative language, fusional language Edit
Has grammatical gendermasculine, feminine, neuter Edit
Writing systemBreton alphabet Edit
Language regulatory bodyOfis Publik ar Brezhoneg Edit
UNESCO language status4 severely endangered Edit
Ethnologue language status7 Shifting Edit
Hashtagbzhg Edit
Has conjugation classregular Breton conjugation Edit
Related categoryCategory:Breton pronunciation Edit
Wikimedia language codebr Edit

Context edit

Language spoken in Brittany (Q327), some limited recognition in Brittany (Q12130) and in France (Q142).

Part of the Brythonic (Q156877) (with Cornish (Q25289) and Welsh (Q9309)), itself part of Insular Celtic (Q214506), Celtic (Q25293), of Indo-European (Q19860).

Replaced the Middle Breton (Q787610) and Old Breton (Q3558112).

Dialects :

Sample Lexemes by Lexical Category
noun verb adjective personal pronoun adverb preposition conjunction interjection numeral
ki (L69) labourat (L764) gwenn (L30901) c'hwi (L409291) eno (L628445) gant (L184227) ha (L56905) ac'h (L641623) daou (L347658)

Use Breton alphabet (Q20581324), modified version of Latin alphabet (Q41670).

A (Q9659), B (Q9705), ch (Q142237), C'h (Q2344267), D (Q9884), E (Q9907), F (Q9765), G (Q9739), H (Q9914), I (Q9893), J (Q9773), K (Q9922), L (Q9927), M (Q9933), N (Q9937), O (Q9941), P (Q9946), R (Q9852), S (Q9956), T (Q9813), U (Q9747), V (Q9963), W (Q9964), Y (Q9973), Z (Q9751),

Also used (rare):

diacritic (Q162940) edit

More rare, mostly in old documents before the orthograph stabilized :

Mainly masculine (Q499327), feminine (Q1775415)

Some traces of inanimate (Q51927539)[3] and neuter (Q1775461).

Breton has 3 grammatical numbers: singular (Q110786), dual (Q110022), plural (Q146786).

dual (Q110022) edit

Dual is formed with a gendered prefix (daou- / div-)[4] and is restricted to only a small number of lexemes ; other lexemes use the common "a pair of" to represent the equivalent of the dual[5]. Dual can itself be pluralized to represented several pairs (double plural). See lagad (L114) : lagad (eye), lagadoù ("eyes"@en, rare, as it is separate eyes, like in "the Witch has a bag of eyes"), daoulagad (two eyes ≈ *"bi-eye", like in "the two eyes on someone face", "she has blue eyes [on her face]"), daoulagadoù (two-eyes, like in "the eyes in a crowd of people").

plural (Q146786) edit

Plural can be irregular and has a lot a different suffixes for differents classes of noun : -où (more common), -ed (for people), -ien/-ion (also for people). The suffixe can change the stem (kazh (L458), plural of "kazh" is "kizhier" where the suffixe -ier change the first "a" in "i"). There is also broken plural (Q3392669) and some words have a totally irregular plural, sometimes by suppletion (Q324982) (see ki (L69), ki/chas).

collective (Q694268) and singulative (Q1450795) edit

Some words (around 10%) are unmarked as collective (Q694268) and have a singulative (Q1450795) (which itself can have a plural forms, similar to a paucal (Q489410)).

See gwez (L62):

  • gwez : trees ;
  • gwezenn : tree ;
  • gwezennoù : a few trees.

All verbs are regular except 6 of them : kaout (L3395), bezañ (L3396), ober/gober (L3397), mont (L3398), dont (L3399), gouzout (L3400).

Breton has consonant mutation (Q557863) where the first consonant of a word change depending on the context and on the previous word.

Only these consonnants are affected: K, T, P, G, D, B, M, Gw and there is 4 types of mutations.

no mutation (Q101252532) soft mutation (Q56648699) aspirate mutation (Q56648701) hard mutation (Q97130345)
k g c'h
t d z
p b f
g c'h k
d z t
b v p
m v
gw w kw

Example with ki (L69):

  • ø ki = dog
  • ur c'hi = a dog ; ar c'hi = the dog ; ma c'hi = my dog
  • da gi = your dog

There is also a nasal mutation (Q56648700) but it's used in only a very small number of words starting with a D, dor (L229826) being the only common word concerned.

There is also a mixed mutation which is a mix between soft and hard mutation

Main one: peurunvan orthography (Q55085635)

Less common: academic Breton orthography (Q54555486), interdialectal Breton orthography (Q54555509)

And a lot of rare/old orthography.

Queries edit

Words without sense

#title:Words without sense
SELECT ?l ?lemma WHERE {
  ?l dct:language wd:Q12107 ; wikibase:lemma ?lemma .
  MINUS { ?l ontolex:sense ?s . }
}
ORDER BY ?lemma
Words without sense

Words without form

#title:Words without form
SELECT ?l ?lemma WHERE {
  ?l dct:language wd:Q12107 ; wikibase:lemma ?lemma .
  MINUS { ?l ontolex:lexicalForm ?f . }
}
ORDER BY ?lemma
Words without form
  1. To be improved

Mutable words without mutation grammatical features

#title:Mutable words without mutation grammatical features (k only for now)
SELECT ?l ?representation ?initial WHERE {
  ?l dct:language wd:Q12107 ; wikibase:lemma ?lemma .
  ?l ontolex:lexicalForm ?form .
  ?form ontolex:representation ?representation .
  BIND ( SUBSTR(STR(?representation),1,1) AS ?initial )
  FILTER ( ?initial = "k" ) #extend to k, t, p, g, d, b, m for "no mutation" + other letters for other mutations
  MINUS { ?form wikibase:grammaticalFeature wd:Q101252532 }
}
}
ORDER BY ?lemma
Mutable words without mutation grammatical features (k only for now)

Nouns (noun (Q1084)) without grammatical gender (P5185)

#title:Nouns without grammatical gender
SELECT ?l ?lemma WHERE {
  ?l dct:language wd:Q12107 ; wikibase:lexicalCategory wd:Q1084 ; wikibase:lemma ?lemma .
  MINUS { ?l wdt:P5185 [] }
}
Nouns without grammatical gender

Lexemes using described by source (P1343) = Lexique étymologique du breton moderne (Q19216625)

SELECT DISTINCT ?item ?lemma ?page WHERE {
  ?item wikibase:lemma ?lemma ; dct:language wd:Q12107 ; p:P1343 [ ps:P1343 wd:Q19216625 ; pq:P304 ?page ] .
}
Try it!

Lexemes described by source (P1343) Lexique étymologique du breton moderne (Q19216625) without sense

SELECT DISTINCT ?lexeme ?lemma ?lexicalCategoryLabel {
  ?lexeme wikibase:lemma ?lemma ; wikibase:lexicalCategory ?lexicalCategory ; wdt:P1343 wd:Q19216625 .
  FILTER NOT EXISTS { ?lexeme ontolex:sense ?sense } .
  SERVICE wikibase:label { bd:serviceParam wikibase:language "[AUTO_LANGUAGE],br,fr" }
}
ORDER BY ?lexicalCategoryLabel ?lemma
Try it!

Verb lexemes where the past participle (Q12717679) is not consistent with word stem (P5187)

SELECT * WHERE {
  ?l wikibase:lemma ?lemma ; wikibase:lexicalCategory wd:Q24905 ; dct:language wd:Q12107 .
  ?l wdt:P5187 ?them .
  BIND ( CONCAT(?them , "et") AS ?part )
  ?l ontolex:lexicalForm ?f .
  ?f wikibase:grammaticalFeature wd:Q12717679 .
  ?f ontolex:representation ?form .
  FILTER ( ?part != str(?form) )
}
Try it!

Common mistake, confusing collective (Q13473501) (wrong) and collective (Q694268) (right):

#title:Wrong grammatical feature
SELECT * WHERE {
  ?l dct:language wd:Q12107 ;
     ontolex:lexicalForm ?f .
  ?f wikibase:grammaticalFeature wd:Q13473501 .
}
Wrong grammatical feature

Forms with "no mutation" not starting with a mutable letter

#title:Forms with "no mutation" not starting with a mutable letter
SELECT * WHERE {
  ?l dct:language wd:Q12107 ; ontolex:lexicalForm ?f .
  ?f wikibase:grammaticalFeature wd:Q101252532 ; ontolex:representation ?form .
  FILTER (!REGEX(?form,"^[ktpgdbmKTPGDBM]"))
}
Forms with "no mutation" not starting with a mutable letter

Forms with "soft mutation" not starting with a soft letter

#title:Forms with "soft mutation" not starting with a soft letter
SELECT * WHERE {
  ?l dct:language wd:Q12107 ; ontolex:lexicalForm ?f .
  ?f wikibase:grammaticalFeature wd:Q56648699 ; ontolex:representation ?form .
  FILTER (!REGEX(?form,"^[gdbczvwoGDBCZVWO]"))
}
Forms with "soft mutation" not starting with a soft letter

Forms with "hard mutation" not starting with a hard letter

#title:Forms with "hard mutation" not starting with a hard letter
SELECT * WHERE {
  ?l dct:language wd:Q12107 ; ontolex:lexicalForm ?f .
  ?f wikibase:grammaticalFeature wd:Q97130345 ; ontolex:representation ?form .
  FILTER (!REGEX(?form,"^[ktpKTP]"))
}
Forms with "hard mutation" not starting with a hard letter

Forms with "aspirate mutation" not starting with a aspirate letter

#title:Forms with "aspirate mutation" not starting with a aspirate letter
SELECT * WHERE {
  ?l dct:language wd:Q12107 ; ontolex:lexicalForm ?f .
  ?f wikibase:grammaticalFeature wd:Q56648701 ; ontolex:representation ?form .
  FILTER (!REGEX(?form,"^[fzcFZC]"))
}
Forms with "aspirate mutation" not starting with a aspirate letter

Lexical Masks edit

See Wikidata:Lexical Masks.

History on Wikidata edit

Ressources edit

These statistics use corpus data from the Leipzig Corpora Collection.

  • Forms in Wikidata: 8,703
  • Forms in Wikipedia: 9,552
  • Tokens: 1,459,030
  • Covered forms: 1,719 (18.0%)
  • Missing forms: 7,833 (82.0%)
  • Covered tokens: 1,021,384 (70.0%)
  • Missing tokens: 437,646 (30.0%)
  • Most frequent missing forms

References edit

  1. 1.0 1.1 See explanation in French on « Instruction pour l’usage du lexique » in the Lexique étymologique du breton moderne, Victor Henry (Q1386172), 1900.
  2. 2.0 2.1 See « Notice sur l’ortographe bretonne et sa prononciation »
  3. For exemple, the suffix "-se" for "this"@en. In verb conjugation, for the third-person singular (Q51929447), the third form is seen as neutral, neuter, human or collective depending of the grammar
  4. It is quite similar to the prefix bi- in several langages, like "biplane"@en.
  5. "ur re"@br is the Breton for "a set or a pair"@br: "ur re votoù"@br = "a pair of shoes"@en. If used before a dual (rare) the pair is not calculated "ur re zaoulagad" is "a pair of eyes" (so only 2 eyes, and not 4 eyes, even if it is litteraly "a pair of two-eye").