"language" in the label

Is it really necessary to have the word "language" in the English label? Wikipedia generally adds "language" in things like "Armenian language" to distinguish it from other meanings (nationality, etc), whereas here we could have just "Armenian", because we don't have a problem with duplicate labels. --Yair rand (talk) 05:03, 6 November 2012 (UTC)

  • In principle, you are right, but as soon as search is substandard and does not show descriptions, I think it is better to use extended labels--Ymblanter (talk) 06:35, 6 November 2012 (UTC)
    • Based on a few searches, it is actually a lot easier to locate an language by searching for "English language" for example rather than just "English." What about aliases? I notice that as of right now, "English language" is also known as "English." Should every language also be known as itself minus "language"? --Haplology (talk) 16:08, 12 November 2012 (UTC)
      I would add it indeed.--Ymblanter (talk) 19:43, 12 November 2012 (UTC)

bot imports

As you may have noticed my bot created many entities about languages today. I started the bot at linked articles on en:List of languages by name and de:Kategorie:Einzelsprache. Pages not having langlinks or causing conflict are skipped. The bot will finish within the next hour. Merlissimo (talk) 23:11, 12 November 2012 (UTC)

Extinct languages?

is there any treatment of extinct languages? Examples: Gothic, Phoenician, Akkadian, Sumerian.--Giftzwerg 88 (talk) 12:38, 27 April 2013 (UTC)

If they have an article in any of Wikipedias they must have also an item in Wikidata with necessary properties. Infovarius (talk) 09:00, 10 June 2013 (UTC)
I would argue they should exist with appropriate claims and sources even if they don't have Wikipedia articles, but of course the notability policy does not allow that... πr2 (tc) 17:01, 3 April 2014 (UTC)

Instance of language

A lot of articles on individual languages currently have P:P31 ("instance of") set to Q4113741 ("langue and parole"), which seems strange. (See Special:WhatLinksHere/Q4113741 for a list.) I think this is due to Q4113741 previously listing articles now included in the item Q34770 ("single language").

But what should be the main value of P:P31 for individual languages? Q315 or Q34770? Gabbe (talk) 10:29, 4 May 2013 (UTC)

I would say Q34770, clearly, but I am not sure whether it should be P:P31 or P:P279. --Zolo (talk) 16:15, 5 May 2013 (UTC)
I've changed the ones that had Q4113741 to something else, mostly Q34770 with a few exceptions. Gabbe (talk) 17:48, 5 May 2013 (UTC)

<languages />

As far as I can understand, this code inserts the box, in which you can choose the languages. I see a major problem cumming up. For now we have some 300 languages, but the number of languages (and dialects) increases with time, also the number of translated pages will increase. Within some months we will have a very big box to choose and people will get mixed up with hundreds of links. Instead we should have a box with some 20-30 languages on top, including the worlds most common languages and the languages of the top ten wikipedias in size, all links to the other languages should be placed in geographical sections which will pop up. So we can keep the boxes small and also have access to all languages without messing up the display. --Giftzwerg 88 (talk) 11:51, 4 May 2013 (UTC)

Language taxonomy

Here are some examples how the properties expressing language taxonomy are currently implemented:

Item P133 ("language family") P283 ("group of languages") P284 ("subgroup of languages") P285 ("branch of languages")
Q1321 ("Spanish") Q19860 ("Indo-European languages") Q19814 ("Romance languages") Q749533 ("Iberian Romance languages") Q1377152 ("West Iberian languages")
Q7737 ("Russian") Q19860 ("Indo-European languages") Q23526 ("Slavic languages") Q144713 ("East Slavic languages") Q147356 ("Balto-Slavic languages")
Q8748 ("Albanian") Q19860 ("Indo-European languages") Q35976 ("Illyrian languages") Q1815070 ("Paleo-Balkan languages")
Q9299 ("Serbian") Q19860 ("Indo-European languages") Q146665 ("South Slavic") Q23526 ("Slavic languages")

The "group/subgroup/branch" doesn't seem to be implemented consistently. Is "branch" a higher taxonomic rank than "group"? Gabbe (talk) 14:56, 4 May 2013 (UTC)

In russian language "branch" ("ветвь") is higher than group. For Spanish e.g. the branch is "Italic". But I suppose that in linguistics the names of taxonomic ranks is not so standardized as in biology... Infovarius (talk) 21:12, 6 May 2013 (UTC)
Moreover, it seems that there is scientific classification with strictly defined levels: ru:Языковая систематика.

Relating languages

  • There are a bunch of extant properties for familial relationships between languages: P133 (P133), P134 (P134), P283 (P283), P284 (P284), P285 (P285).
    • It is not at all clear that P283 (P283), P284 (P284) and P285 (P285) are useful in the presence of P133 (P133):
      • A family-group-subgroup-branch hierarchy is not standard in linguistics
      • Germanic languages (Q21200) is a language family as much as its parent Indo-European languages (Q19860), in that it has descendants.
      • A four level hierarchy is already unsuitable for describing many languages. E.G., English (Q1860) has 5 levels of ancestors on enwiki (and it doesn't even include Ingvaeonic!).
      • A fixed-depth hierarchy with fixed names is not futureproof. What happens if new superfamilies are shown to exist? New intermediate groupings?
    • For these reasons, I think deleting these three properties is a good plan, regardless of any other consideration. None of the taxonomic properties have much current usage, so there isn't much updating that would have to be done.
  • How can diachronic derivation be represented? It would be nice to say that English (Q1860) is derived from Early Modern English (Q1472196) is derived from Middle English (Q36395) (...) Proto-Germanic (Q669623) (...). This would probably need a new property, developed-into.
    • Supposing we do this, English (Q1860) wouldn't need to directly mention it's a member of Germanic languages (Q21200), since that's implied by being a descendent of Proto-Germanic (Q669623). There would also be a new property needed here, developed-into-family. Is it better to have these relations normalised & deduplicated or denormalised & duplicated?
    • This doesn't work so perfectly where there isn't a single well-defined protolanguage, E.G. Gallo-Romance languages (Q500394). Maybe it could go Latin developed-into Vulgar Latin developed-into-family Romance has-subclass Gallo-Romance has-member Old French developed-into French?
    • How should Modern English (Q1649537) and Middle English (Q36395) be related to English (Q1860)? Derived-from doesn't seem quite right, since 'English' often encompasses everything back to Anglo-Saxon. If there was a linguistic-variety property and a temporal (diachronic?) qualifier, we could express that relationship nicely. That could also then subsume P134 (P134), which would use some other qualifier (geographical? Synchronic?).
  • Controversial and uncertain language groupings could be distinguished by qualifiers, or maybe widely accepted ones should be the qualified ones. Then data users in wikipedias can decide to not display the most uncertain families at all, and specially mark the doubtful ones. What about cases with a diamond pattern, E.G., P/Q versus continental/insular grouping of Celtic languages (Q25293)?

KleptomaniacViolet (talk) 14:44, 1 September 2013 (UTC)

I've gone ahead and nominated P283 (P283), P284 (P284) and P285 (P285) for deletion, so head over to WD:PFD if you feel strongly about it. KleptomaniacViolet (talk) 15:26, 3 September 2013 (UTC)

For you information, Malaysian (Q15065) and Indonesian (Q9240) use subclass of (P279). Visite fortuitement prolongée (talk) 20:36, 22 September 2013 (UTC)

Malay (Q9237) and Malayic languages (Q662628) too. Visite fortuitement prolongée (talk) 19:20, 25 October 2013 (UTC)
Malayo-Sumbawan languages (Q1363818) and Nuclear Malayo-Polynesian languages (Q1190607) too. Anglic languages (Q1346342), French (Q150), German (Q188), Spanish (Q1321), Finnish (Q1412), Hindi (Q1568), English (Q1860), Catalan (Q7026), Russian (Q7737), Portuguese (Q5146), Yiddish (Q8641), Albanian (Q8748), Ukrainian (Q8798), Belarusian (Q9091), Serbian (Q9299), West Frisian (Q27175), use part of (P361). Visite fortuitement prolongée (talk) 21:15, 27 October 2013 (UTC)
Made correction because it was my misprint. I propose to use either part of (P361) or subclass of (P279) for building hierarchy (P133 (P133) I mean to use only for highest rank). Let us choose the one. Infovarius (talk) 17:15, 28 October 2013 (UTC)
Made correction. Visite fortuitement prolongée (talk) 21:06, 30 October 2013 (UTC)
Found Help:Basic membership properties. Visite fortuitement prolongée (talk) 21:33, 6 November 2013 (UTC)

After the deletion of intermediate properties we have to exactify the use of P133 (P133) in Property talk:P133. Infovarius (talk) 21:24, 26 October 2013 (UTC)

  Done Visite fortuitement prolongée (talk) 21:06, 30 October 2013 (UTC)
Oups, I saw this discussion too late... I started to edit a lot of languages and I used part of (P361) instead of subclass of (P279). Is there a way to find the language items which use part of (P361) in order to replace it by subclass of (P279)? Pamputt (talk) 19:25, 21 August 2014 (UTC)
Yes, it can be done by Autolist(2). --Infovarius (talk) 14:28, 22 August 2014 (UTC)
Mmmh, I found Autolist but I am a bit lost. I do not understand what I have to write in the box. Any example would help me a lot :D Pamputt (talk) 14:46, 22 August 2014 (UTC)
I found how to use Autolist, so there are for now 456 items which have the property part of (P361). Should we remove all of them? Pamputt (talk) 15:54, 22 August 2014 (UTC)
Oh, sorry for disinformation - if targets of property are different Autolist cannot help to change it... Infovarius (talk) 19:55, 22 August 2014 (UTC)

More properties for infobox importing

To import all the relevant data from w:en:Template:Infobox language, some more properties will be needed:

  • Native name of the language (mapping to the nativename parameter)
  • Locations where it's spoken (from the state and region parameters)
  • Number of speakers (from the speakers parameter; needs to wait for the Number datatype?)
  • Date of extinction (extinct parameter)
  • Places where it has official recognition of some kind, with the type of recognition as a qualifier (nation and minority parameters)
  • Regulatory body of the language (agency parameter)

If no-one has any thoughts on the matter, I'll go over to properties for creation and ask for these. KleptomaniacViolet (talk) 18:29, 1 September 2013 (UTC)

See Wikidata:Property proposal/Term#Languages_.2F_Sprachen_.2F_Langues KleptomaniacViolet (talk) 16:43, 3 September 2013 (UTC)

Main item

Which is the main item for English, Deutsch, русский and so on? Q4113741, Q315, or Q34770? Infovarius (talk) 09:38, 26 November 2013 (UTC)

ISO 639-5

Their is no ISO 639-5 propertie. For example, currently "Greek (Q9129) IETF language tag (P305) grk", where "grk" is an ISO 639-5 code, but there is no "Greek (Q9129) ISO 639-5 grk" Statement. Visite fortuitement prolongée (talk) 20:20, 30 November 2013 (UTC)

See Wikidata:Property proposal/Term#ISO 639-5. Visite fortuitement prolongée (talk) 22:32, 29 December 2014 (UTC)
created: ISO 639-5 code (P1798). Visite fortuitement prolongée (talk) 20:46, 8 May 2015 (UTC)

Main item for properties constraint

Infovarius wrote in 89562576 "I propose something like Q3329375 as the upper level of all linguistic forms of communication". I followed this suggestion. Now, the following properties use language variety (Q3329375) in their constraint:

The following items are sub-class of language variety (Q3329375) directly:

The following items are sub-class of Q3329375 indirectly:

Comments are welcome. Visite fortuitement prolongée (talk) 20:51, 30 November 2013 (UTC)

Now sign language (Q34228) P279 Q34770. Visite fortuitement prolongée (talk) 20:56, 2 December 2013 (UTC)
@Visite_fortuitement_prolongée @Infovarius @GranD @Melody_Lavender @ColinFine @Mateusz.ns @Fawkesfr I added some maintenance links at user:Gangleri/sandbox/index/languages. Regards gangLeri לערי ריינהארט (talk) 01:58, 15 May 2014 (UTC)

I replaced language variety (Q3329375) by languoid (Q17376908) Visite fortuitement prolongée (talk) 22:21, 26 December 2014 (UTC)

Language support

I have started a discussion at Wikidata:Project_chat#Language_support about the lack of language support. See also mw:Talk:Unrecognised_languages#Official_languages. John Vandenberg (talk) 13:11, 31 December 2013 (UTC)

Wikidata, BNCF, GND, DDC, Omegawiki

Hi! I found this page by coincidence. Please note . Can anybody translate the note into English? Thanks in advance! gangLeri לערי ריינהארט (talk) 19:51, 14 May 2014 (UTC)

workflow topics

Hi! There are some thousand languages. Has it be discussed previously to implement / define a workflow strategy? I can imagine that one could use maintenance statements inside the items. Until some general maintenance statements are available in WD these could be sandbox statements which define

a) a common mark for new items / candidates for this task force
b) marking the quality level of the item, as
hierarchical statements reviewed
labels and descriptions reviewed (consistent usage of singular and plural)
aliases reviewed
ping for translations to major languages
c) no WMF article available
d) marking inconsistencies (links to disambiguations, different meanings across languages, links to external ontologies)
e) missing external links to major sites, ISO, BNCF, GND, Freebase etc.
f) marking other items (not languages) relevant for this task force
m) merge proposals
s) splitting proposals

The question arrived because actual external lists, bookmarks etc. are neither efficient nor available for all contributors. Most statements should be temporary (except a quality mark). Regards gamgLeri לערי ריינהארט (talk) 15:31, 15 May 2014 (UTC)

ping (all contributors to this project / project talk page): @Visite_fortuitement_prolongée, @GranD, @Melody_Lavender, @Mateusz.ns, @Fawkesfr, @ColinFine, @Kristian_Vangen, @Hazard-SJ, @Hydriz, @Kolja21, @Haplology, @Moe_Epsilon, @Lam-ang, @Wylve, @Wikitorrens, @FischersFritz, @Nizil_Shah, @Sumone10154, @Stryn, @Janjko, @Ymblanter, @Dharmadhyaksha, @PiRSquared17, @John_Vandenberg, @KleptomaniacViolet, @Infovarius, @Zolo, @Gabbe, @Giftzwerg_88, @Merlissimo, @Haplology, @Ymblanter, @Yair_rand
Hi! I would like to have a "collection" notion from,, etc. ontologies where all identified items are inserted in a first step. Later the day / as soon as a schema is known / as soon as one of the major scientific models is chosen the items are linked to the basic items of the collection.
One could either create such an item or use sandbox properties linking to a WD page. Can be tested at Wikidata Sandbox (Q4115189) .
Please do not add polemic contributions! Thanks in advance! gangLeri לערי ריינהארט (talk) 09:26, 25 May 2014 (UTC)


Indicating "parent" of pidgin

Hello. It should be possible to indicate that a certain language is a pidgin based on other language(s). Any idea how to do so? Similarly, it would be useful to indicate the languages a constructed language is mainly based on/derived from. πr2 (t • c) 23:07, 4 July 2014 (UTC)

Language code properties

For reference:

πr2 (t • c) 03:43, 5 July 2014 (UTC)

Properties_for_deletion: P133

Hello, please could you have a look on Wikidata:Properties_for_deletion#.7B.7BPfD.7CProperty:P133.7D.7D? Pamputt (talk) 19:28, 22 August 2014 (UTC)

Wikidata:Properties for deletion#IPA (P898)

Listed there to change the datatype. --- Jura 17:38, 19 February 2015 (UTC)

2015-05 langues ryukyu

@Pamputt: Le contributeur X meta vient de modifier plusieurs fiches sur des langues ryukyu. Pourriez vous regarder ces modifications ? Visite fortuitement prolongée (talk) 20:14, 22 May 2015 (UTC)

@Visite fortuitement prolongée: est ce que tu as des exemples précis (ou la période où il a effectué les modifications) car à première vue il n'a pas fait beaucoup de modifs sur les langues et celles-ci sont noyées au milieu des autres. Pamputt (talk) 09:10, 23 May 2015 (UTC)
C'est bon, j'ai retrouvé, c'était autour du 21/05/2015 10:00. Je suis repassé dessus. Pamputt (talk) 09:29, 23 May 2015 (UTC)
Faut-il fusionner Q7082043 et Q19965899 ? Visite fortuitement prolongée (talk) 18:02, 23 May 2015 (UTC)
@X meta: could you confirm that Okinawan languages (Q7082043) and Northern Ryukyuan languages (Q19965899) refers to the same thing? Pamputt (talk) 21:50, 23 May 2015 (UTC)
It's not the same because Northern Ryukyuan languages (Q19965899) is a general term for Kunigami (Q56558),Yoron (Q2424943) and Okinoerabu (Q3350036), but Okinawan languages (Q7082043) is a general term for Kunigami (Q56558) and Okinawan (Q34233). --X meta (talk) 04:49, 24 May 2015 (UTC)
Ok, thank you. Pamputt (talk) 05:07, 24 May 2015 (UTC)

2015-05 language classification

There is not yet a policy and a scheme for classification of languages, languages familly and other languoid (Q17376908). How should be it done? Which properties should be used? The same for all cases, or different depending the type? Notice Help:Basic membership properties. Visite fortuitement prolongée (talk) 18:02, 23 May 2015 (UTC)

Smaller group

P134 can only be used when te target is a dialect(s). Visite fortuitement prolongée (talk) 18:18, 23 May 2015 (UTC)

Bigger group

P171 can only be used between taxon(s), but we could create a similar property for languoid(s). Visite fortuitement prolongée (talk) 18:18, 23 May 2015 (UTC)

subclass of (P279) is enough if referenced item consists of instance of (P31) with one of language (Q34770), language family (Q25295) or language group (Q941501). Paweł Ziemian (talk) 21:02, 23 May 2015 (UTC)

Other discussion 1

Remove P527 and P361? See "CLAIM[31:(TREE[17376908][][279])] AND CLAIM[527]" (resp. "AND CLAIM[361]") in . Visite fortuitement prolongée (talk) 19:47, 13 June 2015 (UTC)

Query: CLAIM[31:(TREE[17376908[][279])] AND CLAIM[527]] and Query: CLAIM[31:(TREE[17376908[][279])] AND CLAIM[361]]. Visite fortuitement prolongée (talk) 19:30, 22 June 2015 (UTC)

Subclass of family

Hi Visite fortuitement prolongée, j'ai vu que tu avais modifié pas mal de déclaration avec des langues comme sous classe de famille de langue, ça me semble pas vraiment correct. Une langue c'est une sous classe de communication peut être (la classe de toutes les communications qui ont été faites dans la vraie vie dans cette langues), mais certainement pas une sous classe d'une famille de langue, ça ferait qu'une langue est un ensemble de langues à elle toute seule ... ça ne colle pas.

On devrait avoir plutôt :

⟨ français ⟩ instance de search ⟨ langue latine ⟩


⟨ langue latine ⟩ subclass of (P279)   ⟨ langue ⟩

(langue est l'ensemble de toutes les langues, les langues latines en sont un sous ensemble) et

⟨ langue latine ⟩ instance of (P31)   ⟨ famille de langue ⟩

. TomT0m (talk) 20:46, 15 June 2015 (UTC)

What do you mean by "langue latine"? I can not find any Wikidata with "langue latine" as label, in any language. Visite fortuitement prolongée (talk) 21:15, 15 June 2015 (UTC)
I guess I mean Romance languages (Q19814)     , but I think you already knows that. TomT0m (talk) 21:19, 15 June 2015 (UTC)

Visite fortuitement prolongée If you want to rank families, It's pretty possible to say

⟨ Romance languages (Q19814)      ⟩ instance of (P31)   ⟨ super famille de langue ⟩

(or wathever relevant rank) with

⟨ super famille de langue ⟩ subclass of (P279)   ⟨ language class ⟩

(as the set of instances of super families is a subset of all language classe).

First step

First step. Visite fortuitement prolongée (talk) 20:22, 19 June 2015 (UTC)

Other discussion 3

The Wikidata items about dialects lists will be processed later. A not exhaustive list of those items: West Palatine (Q22587) Spanish dialects and varieties (Q251211) Romanian subdialects (Q471107) Swedish dialects (Q586239) Hungarian dialects (Q837646) regional Italian (Q1098467) Basque dialects (Q1218937) Japanese dialect (Q1246128) Dutch dialects (Q1323611) Varieties of Arabic (Q1422423) Norwegian dialects (Q1509562) Russian dialect (Q2121919) German dialects (Q2306552) Ancient Greek dialects (Q2477440) list of dialects of English (Q2631145) Varieties of Modern Greek (Q2742027) Breton dialects (Q3025955) Catalan dialect (Q3025977) Bengali dialects and varieties (Q3554758) Varieties of French (Q3554836) Aragonese dialects (Q3571271) Occitan dialects (Q3706547) Belarusian dialect (Q4161023) dialect of Ukrainian (Q4161027) Nahuan languages (Q11965602) Mandarin dialects (Q19952336) Visite fortuitement prolongée (talk) 19:47, 9 July 2015 (UTC)

varieties of Chinese (Q2748296). Visite fortuitement prolongée (talk) 19:49, 16 July 2015 (UTC)

Other discussion 4

Langues italo-romanes

Une idée pour un item Wikidata auquel rattacher w:fr:Langues italo-romanes, qui regroupe Gallo-Italic languages (Q516074), Italo-Dalmatian languages (Q3313381), et Italian romance languages (Q3356483) ?

An idea about which Wikidata item link w:fr:Langues italo-romanes, a superset of Gallo-Italic languages (Q516074), Italo-Dalmatian languages (Q3313381), and Italian romance languages (Q3356483)? Visite fortuitement prolongée (talk) 19:52, 13 June 2015 (UTC)

@Visite fortuitement prolongée: : from Spanish article, we see that "lenguas italianas centromeridionales" is also called "lenguas italoromances" which fits quite well with the French name. The map given is the three articles of Italian romance languages (Q3356483) is also the same as the one shown in the French article. I also merged with Italian romance languages (Q17635594) that refers to the same thing. Pamputt (talk) 08:17, 14 June 2015 (UTC)


Are Special:Diff/206922469 and Special:Diff/206922544 error or vandalism? Visite fortuitement prolongée (talk) 19:32, 22 June 2015 (UTC)

Fusional language

The 2 concept of fusional language and flectional language were mixed in Q318917 (since about 2007, it is not wikidata fault). Please move Label, correct Label and move Sitelink from fusional language (Q318917) to flectional language (Q20440513) when needed. Visite fortuitement prolongée (talk) 19:43, 27 June 2015 (UTC)

Same for Category:Fusional languages (Q8476030) and Category:Fusional languages (Q20440609). Visite fortuitement prolongée (talk) 19:57, 27 June 2015 (UTC)

Guthrie code

See Wikidata:Property_proposal/Authority_control#code_Guthrie for the property proposal. Pamputt (talk) 23:00, 14 July 2015 (UTC)

Wikimania 2016

Only this week left for comments: Wikidata:Wikimania 2016 (Thank you for translating this message). --Tobias1984 (talk) 11:48, 25 November 2015 (UTC)

Adding the UNESCO Atlas of the World's Languages in danger to Wikidata

Hi all

I'm involved in adding all languages listed in the UNESCO Atlas of the World's Languages in Danger to Wikidata using Mix n' Match, just scroll down the list to find the correct catalogue. I hope you can help, there are instructions of how to use Mix n' Match in Manual mode here.


John Cummings (talk) 10:14, 26 January 2016 (UTC)

Thank for reminding this. I matched some items and I finished to translate into French the Meta documentation of this game. Still a lot of work to do with this (a bit less than 2000 entries still to match). Pamputt (talk) 23:03, 27 January 2016 (UTC)

2016-03 Experiments

What to do for the following experiments? Delete them? Visite fortuitement prolongée (talk) 21:14, 13 March 2016 (UTC)

type of language (Q20829075) is logical thing. The others I don't need personally. You should ping creators, I think. --Infovarius (talk) 10:14, 17 March 2016 (UTC)


FYI, I proposed to create a new property for word order (Q257885). If you are interested, it is here. Pamputt (talk) 16:11, 2 July 2017 (UTC)


Hello, a lot of Glottolog code (P1394) has still to be add wihtin the language items. This query shows some of them that have a ISO 639-3 code (P220) and no Glottolog code (P1394)

SELECT DISTINCT ?item ?itemLabel ?itemDescription ?iso
    ?item wdt:P220 ?iso . # looking for language that has a ISO 639-3 code
    MINUS { ?item wdt:P1394 ?glottolog .} # with missing Glottolog ID
    MINUS { ?item rdf:type wdno:P1394 .} # without those with a « no value » Glottolog ID
    SERVICE wikibase:label { bd:serviceParam wikibase:language "[AUTO_LANGUAGE],en" }
        ?item schema:description ?itemDescription .
        FILTER(LANG(?itemDescription) = "[AUTO_LANGUAGE],en") # with missing "your language" description
ORDER BY ?itemLabel

Try it!

Currently there are a bit less than 500 items that still need to be matched. Pamputt (talk) 06:07, 21 August 2017 (UTC)

Wikidata for places a language is spoken in

At d:Wikidata:Project chat#Places a language is spoken in? I asked whether information about where a language is spoken could be added to Wikidata. The agreement seemed to be that it should, but I'm not sure how it would be done, I have very little experience with Wikidata. It was suggested there that I should take the matter here, so here I am.

There are a few points to consider:

  • Numbers of speakers at a given point in time
  • First vs second-language speakers
  • Former vs current speakers (and handling of extinct languages)
  • Languages spoken in a given country vs countries a given language is spoken in. Should smaller-than-country places be considered?

I would appreciate any feedback and help with working out the data format for this. CodeCat (talk) 17:47, 15 September 2017 (UTC)

Natural vs constructed language

Currently, we have constructed language (Q33215), which is a language created for humans to use. However, it is a subclass of artificial language (Q3247505), whose description says it applies to "language which emerge either in computer simulations between artificial agents, robot interactions or controlled psychological experiments with humans". Clearly, a constructed language does not fit this description, but it is indeed an artificial language, so the description could do with some attention. Meanwhile, natural language (Q33742) has the description "language naturally spoken by humans, as opposed to "formal" or "built" languages", but this is also not accurate. Constructed languages are also naturally spoken by humans, they can have native speakers if their parents teach them the language. The real key thing about natural languages is that they arise and evolve naturally within a community that needs to communicate, without specific intent to create the language from scratch.

The higher-level entity language (Q34770) seems intended for all languages, whether for humans or otherwise. computer language (Q629206) is subclassed under it, for example. Yet one of the labels for language (Q34770) is "human language". This is obviously not correct for a computer language. I think it should be used for "language" in the most abstract sense, and new entity be created called "human language". A human language is a language that is spoken, or intended to be spoken, between humans. It includes both natural languages like English, and constructed languages like Esperanto, but not non-human artificial languages such as HTML. This subdivision is not only natural, but also useful for projects like Wiktionary which deal with human languages specifically. CodeCat (talk) 18:13, 15 September 2017 (UTC)

@CodeCat: I think I fully agree with all your remarks. So we can wait for other opinions and make the changes you propose in a few days if there is no opposition. Pamputt (talk) 19:47, 17 September 2017 (UTC)
A problem that I see is that language (Q34770)'s various Wikipedia pages all seem to refer to human languages, i.e. the thing that the to-be-created entity for "human language" is about. So we may want to rename that one to "human language" and create an abstracter parent entity language in general. I'm not sure what the best course is. CodeCat (talk) 20:57, 18 September 2017 (UTC)
1) I think that a definition "natural languages is ...[which] arise and evolve naturally within a community that needs to communicate, without specific intent to create the language from scratch" fails at some artificial languages as Interlingua (it evolves naturally within communities) and at some natural languages as Hebrew (it was specifically created).
2) no more "language" items please! Choose one from: speech (Q52946)     , parole (Q10485156)     , language (Q34770)     , language (Q4536530)     , langue (Q4113741)     , language (Q315)     , languoid (Q17376908)     .
3) of course, you should not forget about gesture languages (natural? but for sure human, and not spoken), animal languages (are they also spoken by humans?)
4) computer languages can be read (orally also) and written by humans, and some programmers can even communicate in them. Infovarius (talk) 14:43, 20 September 2017 (UTC)

Showcase item

Hello, I think it would be good for the WikiProject languages to make an item as a showcase item. It will help us and the next contributors to use this item as an example for other languages. I propose to work on Northern Sami (Q33947) since a lot of work has already been done on this item. Please see Talk:Q33947 if you are interested in. Pamputt (talk) 13:04, 1 October 2017 (UTC)

  • It seems it has quite some way to go as the item that has barely any references, but a single, tertiary one. Maybe it's easier to feature one with less statements to reference ;)
    --- Jura 13:15, 1 October 2017 (UTC)
    I may agree with you. However, I choose this item mainly because it is quite exhaustive, which means that it could use a lot of language related properties. Pamputt (talk) 14:01, 1 October 2017 (UTC)
  • @Pamputt: Firstly, I disagree with using located in the administrative territorial entity (P131) and country (P17) in languages items. Secondly (my ache), how do you prove, that this is language (Q34770) and not language (Q315) or human language (Q20162172)? --Infovarius (talk) 14:55, 12 April 2018 (UTC)
    • @Infovarius: about your first point, could you give us some arguments. From my point of view, this is really useful in order to link a language to an area where it is spoken. These informations may be used by external project (such as Lingua Libre) in order to propose to their users, languages that they may speak. For the second point, I do not have any opinion; IMHO all languages should have the same Q-item (language (Q34770), language (Q315), human language (Q20162172) or whatever) in order we can get all languages with a SPARQL query. For now, all languages use language (Q34770), that is why this item is used on Northern Sami (Q33947). If one changes it here, it should be modify everywhere. Pamputt (talk) 15:06, 12 April 2018 (UTC)
      • The main problem with P131 and P17 I see that these properties are intended for some geographical objects solely connected with one or a few administrative divisions. The other problem is that any "big" language can be spoken in any country (at least by 1 human) - especially it is correct for English. And having a big list (and often - not complete) doesn't give us information which is the "main" country of that language. At least we should have qualifiers with a number of speakers. --Infovarius (talk) 20:51, 13 April 2018 (UTC)

Luri (Q4701277)

Hi, I have no time to manage Luri (Q4701277) now so I let a message here if someone is interested by fixing this item. The problem is there is a mix between two languages: Lori languages and Luri language. Thanks in advance if someone have a look on this. Pamputt (talk) 22:29, 23 October 2017 (UTC)

Property proposal: "part of speech"

I've proposed Wikidata:Property proposal/part of speech. Your comments will be greatly appreciated. Deryck Chan (talk) 13:12, 2 November 2017 (UTC)

Endangered languages

Hi! A new catalog is available and need your help on Mix'n'match: Endangered languages. Pamputt (talk) 19:03, 2 March 2018 (UTC)

Sorosoro ID

I have proposed a new property related to languages: Sorosoro ID. Please give you opinion on the proposal page. Pamputt (talk) 20:20, 13 March 2018 (UTC)

Project Languages & Lexicographical data

Hello all!

I wanted to ask if this project is only about having the list of all languages in Wikidata, or if you imagine a broader scope for the project? :)

With the deployment of the first experiment of lexicographical data in a few weeks, we will certainly have a lot of interesting discussions about languages and how to describe them. For example, one of the first fields that editors have to fill when creating a new Lexeme, will be the language, and they will be able to enter any existing Wikidata item.

We will also need the help from people who have a good knowledge of languages and how they are organised and structured, for example to propose and create new properties for lexicographical data.

In general, I wish we could involve the project more, for all decisions related to languages on Wikidata. I'm thinking that more transparency about how languages are added in Wikidata (languages of labels for example) would be needed, and that the Wikiproject Languages could be a good place for that.

What do you think? Do you have any ideas or suggestions? Let me know :)

Pinging @Pamputt, Deryck Chan, Rua, Syced: who are regularly involved on this page.

Cheers, Lea Lacroix (WMDE) (talk) 10:38, 11 April 2018 (UTC)

  • If you could come with an improved way to create codes for monolingual text properties, that would be great. Currently, it's not really working, at least for Guernsey French and Montenegrin.
    --- Jura 11:44, 11 April 2018 (UTC)
  • I love languages and learning them, and even though I am not at all a linguist I am interested in lexicographical data, I will try to have a look! The relevant conversations should probably happen there I guess, but thanks for pinging us! Syced (talk) 12:49, 11 April 2018 (UTC)
    I am not the one you are looking for when you speak about someone who has good knowledge of languages, I am only interested in. That's said, here is what I think about this project and it can interact with the lexicographical data. As you wrote, I think the most obvious link is to translate the label of the maximum of language items so that they can be found easily by a lot of people, speaking different languages (I am currently working on this task for the French language). I do not see other uses for now. Pamputt (talk) 17:46, 11 April 2018 (UTC)
  • Thanks Léa for considering me a "regular"...! I see two sensible options to code the language of a Lexeme: either use ISO-639-3 or accept any Wikidata item. I don't feel strongly about either approach. On the other hand, Wiktionary does currently contain a lot of "translingual" entries, for example origins and definitions of Chinese characters. The Wikidata "Monolingual text" datatype also allows "mul" as the language which shows that the text is used in multiple languages. I can't pin down a comprehensive data structure within this comment, though I would suggest that the Lexeme developers look at existing Wiktionary entries like this one about a Chinese ideogram and find an appropriate way for Lexemes to encode this information. Deryck Chan (talk) 14:59, 17 April 2018 (UTC)

Your input about requesting new languages for Wikidata

Hello all,

In case you didn't see it, we started a discussion about the existing processes to request and add new languages in Wikidata, the potential issues users encounter, and ideas for solutions. I'd love your feedback on these topics. Feel free to participate to the discussions on the talk page!

Thanks, Lea Lacroix (WMDE) (talk) 08:59, 10 January 2019 (UTC)


Is it common to use language of work or name (P407) to state which languages use a writing system?

Location of languages


This discussion Wikidata:Project chat#Location of languages (about having coordinates or not on items about languages) might interest you.

Cheers, VIGNERON (talk) 16:48, 4 July 2020 (UTC)

Return to the project page "WikiProject Languages".