Wikidata:Lexicographical data/Documentation/Languages/fr

French
modern language, natural language, language
Subclass ofOïl, Southern European language Edit
Native labelfrançais Edit
FollowsClassical French Edit
Has grammatical caseno value Edit
Has grammatical genderfeminine, masculine Edit
Writing systemLatin script Edit
Uses capitalization forproper noun, unknown value Edit
Language regulatory bodyAcadémie Française, Academie Royale de Langue et de littérature Françaises, Office québécois de la langue française Edit
UNESCO language status1 safe Edit
Ethnologue language status0 International Edit
Studied inFrench studies, Romance studies, Q11333703 Edit
History of topichistory of French Edit
Related categoryCategory:French pronunciation Edit
Entry in abbreviations tableфранц. Edit
Wikimedia language codefr Edit

Context edit

Language spoken in France (Q142), Belgium (Q31), Switzerland (Q39), Canada (Q16) (Quebec (Q176)).

Corresponding language codes :

SELECT ?languageCode (COUNT(?lexeme) AS ?count) WHERE {
  ?lexeme dct:language wd:Q150 ; wikibase:lemma ?lemma .
  BIND(LANG(?lemma) AS ?languageCode)
}
GROUP BY ?languageCode
ORDER BY DESC(COUNT(?lexeme))
Try it!

Lexical categories edit

SELECT ?lexCat ?lexCatLabel (COUNT(?lexeme) AS ?count) WHERE {
  ?lexeme dct:language wd:Q150 ; wikibase:lexicalCategory ?lexCat .
  SERVICE wikibase:label { bd:serviceParam wikibase:language "[AUTO_LANGUAGE],en". }
}
GROUP BY ?lexCat ?lexCatLabel
ORDER BY DESC(COUNT(?lexeme))
Try it!

Two genders: masculine (Q499327), feminine (Q1775415)

Some rare weird cases like, famously the 3 nouns: orgue (L471), amour (L1021), délice (L15976) (masculine in singular, feminin in plural).

Some cases where dictionaries disagree on the gender (après-midi (L25740)).

Question of the occupations (where masculine is sometimes - old-fashion? - be seen as neutral/general ; cases where masculine and feminine are the same pirate (L24230), géologue (L621684)).

SELECT ?genre ?genreLabel (COUNT(?l) AS ?nb) WHERE {
  ?l dct:language wd:Q150 ; p:P5185/ps:P5185 ?genre .
  SERVICE wikibase:label { bd:serviceParam wikibase:language "[AUTO_LANGUAGE],en". }
}
GROUP BY ?genre ?genreLabel
ORDER BY DESC(?nb)
Try it!

Two grammatical numbers: singular (Q110786), plural (Q146786).

Some nouns are invariable but we still need two separate forms for each number.

The infinitive should be used as the main lemma (in lowercase, lemma are not title).

Forms edit

Forms vary depending on several grammatical features:

Note: conditional (Q625581) is sometimes considered as part of indicative (Q682111); in Wikidata, we keep it as a mode on its own.

On Wikidata, it is proposed to fill only non-obvious forms (for instance, composite tenses or gerund would not be filled). On the general case, this would give 51 forms for verbs:

With :
* combining first person (Q21714344) / second person (Q51929049) / third person (Q51929074) and singular (Q110786) / plural (Q146786)
** first person singular and plural, second person plural

Some verbs (called defective verb (Q2721259)) can have fewer forms, like pleuvoir (L1917). Some other verbs have more forms, like payer (L10770).

Grammatical features must use atomic values listed above (for instance first person (Q21714344) and singular (Q110786) instead of first-person singular (Q51929218)).

Groups edit

French has 3 groups of conjugation. This is stored in conjugation class (P5186) :

#title: French verbs by group
SELECT ?group ?groupLabel (COUNT(?lexeme) AS ?nb) WHERE {
  ?lexeme dct:language wd:Q150 ; wdt:P5186 ?group .
  SERVICE wikibase:label { bd:serviceParam wikibase:language "[AUTO_LANGUAGE],en". }
}
GROUP BY ?group ?groupLabel
ORDER BY ?groupLabel
French verbs by group

Notes on filling groups on Wikidata edit

SELECT ?lexeme (GROUP_CONCAT(?lemma; separator = ', ') AS ?lemmas) (CONCAT(SUBSTR(STR(?lexeme), 32), ',Q2993354') AS ?qs) {
  ?lexeme dct:language wd:Q150 ; wikibase:lexicalCategory wd:Q24905 ; wikibase:lemma ?lemma .
  FILTER NOT EXISTS { ?lexeme wdt:P5186 [] }
  FILTER(REGEX(?lemma, 'er$'))
}
GROUP BY ?lexeme
ORDER BY ?lemmas
Try it!
SELECT ?lexeme (GROUP_CONCAT(?lemma; separator = ', ') AS ?lemmas) (CONCAT(SUBSTR(STR(?lexeme), 32), ',Q2993358') AS ?qs) {
  ?lexeme dct:language wd:Q150 ; wikibase:lexicalCategory wd:Q24905 ; wikibase:lemma ?lemma .
  FILTER NOT EXISTS { ?lexeme wdt:P5186 [] }
  FILTER((!REGEX(?lemma, 'er$') && !REGEX(?lemma, 'ir$'))|| REGEX(?lemma, 'oir$'))
}
GROUP BY ?lexeme
ORDER BY ?lemmas
Try it!
SELECT ?lexeme (GROUP_CONCAT(?lemma; separator = ', ') AS ?lemmas) (IRI(CONCAT('https://fr.wiktionary.org/wiki/Conjugaison:français/', ?lemmas)) AS ?wkt) {
  ?lexeme dct:language wd:Q150 ; wikibase:lexicalCategory wd:Q24905 ; wikibase:lemma ?lemma .
  FILTER NOT EXISTS { ?lexeme wdt:P5186 [] }
}
GROUP BY ?lexeme
ORDER BY ?lemmas
Try it!

Identifiers edit

Here is a list of identifiers used on French lexemes ; on 2024/04/07, the top 3 is: Cordial Dictionary ID (P11178), Larousse Online French Dictionary ID (P11118), Littré ID (P7724) (more than 10 000 uses each).

SELECT ?prop ?propLabel (COUNT(?l) AS ?number) WHERE {
  ?l dct:language wd:Q150 ;
     ?dict ?id .
  ?prop wikibase:directClaim ?dict .
  ?prop wdt:P31 wd:Q56216056 .
  SERVICE wikibase:label { bd:serviceParam wikibase:language "[AUTO_LANGUAGE],en". }
}
GROUP BY ?prop ?propLabel
ORDER BY DESC ( ?number)
Try it!

Ressources edit

  • Forms in Wikidata: 251,898
  • Forms in Wikipedia: 465,138
  • Tokens: 474,988,250
  • Covered forms: 54,502 (11.7%)
  • Missing forms: 410,636 (88.3%)
  • Covered tokens: 415,303,686 (87.4%)
  • Missing tokens: 59,684,564 (12.6%)
  • Most frequent missing forms

References edit