Logo of Wikidata

Welcome to Wikidata, Ivi104!

Wikidata is a free knowledge base that you can edit! It can be read and edited by humans and machines alike and you can go to any item page now and add to this ever-growing database!

Need some help getting started? Here are some pages you can familiarize yourself with:

  • Introduction – An introduction to the project.
  • Wikidata tours – Interactive tutorials to show you how Wikidata works.
  • Community portal – The portal for community members.
  • User options – including the 'Babel' extension, to set your language preferences.
  • Contents – The main help page for editing and using the site.
  • Project chat – Discussions about the project.
  • Tools – A collection of user-developed tools to allow for easier completion of some tasks.

Please remember to sign your messages on talk pages by typing four tildes (~~~~); this will automatically insert your username and the date.

If you have any questions, don't hesitate to ask on Project chat. If you want to try out editing, you can use the sandbox to try. Once again, welcome, and I hope you quickly feel comfortable here, and become an active editor for Wikidata.

Best regards! - cycŋ - (talkcontribslogs) 06:35, 22 July 2016 (UTC)Reply

Merging edit

Hallo Ivi104,
When you merge items, you may want to use the merge.js gadget from help page about merging. It helps with merging, nominating, gives the option to always keep the lower number (which is older, so preferable in most cases) and makes it a lot easier for the admins to process the requests.
With regards,- cycŋ - (talkcontribslogs) 06:35, 22 July 2016 (UTC)Reply

Some lexeme clarifications edit

Would you be able to explain the differences between voda (L184574), вода/voda (L2068), and вода/voda (L2081)? Mahir256 (talk) 16:33, 12 September 2021 (UTC)Reply

@Mahir256: Hello! This is the same term in three different languages, Croatian (hr), Serbian (sr) and the common historical macrolanguage Serbo-Croatian (hbs). The difference is in that the Croatian language uses exclusively latin script and ijekavian dialect in official writing (e.g. world = svijet), while the other two use both latin and cyrillic writing, and ekavian dialect (e.g. world=svet). Ivi104 (talk) 17:03, 12 September 2021 (UTC)Reply
Thank you for your answer! I have a few more questions, then:
  • I thought Croatian was based on the Kajkavian (Q838165) dialect, not "Ijekavian"?
  • I also thought Serbian had both "Ijekavian" and "Ekavian" varieties; if Croatian has a similar divide, how do you plan to handle that in lexemes?
  • Would you be able to explain the differences between the latter two lexemes (the ones in sr and hbs)? Mahir256 (talk) 18:02, 12 September 2021 (UTC)Reply
@Mahir256:Thank you for your questions:
  • The Croatian base dialect is Shtokavian (Q148893) (local name štokavsko narječje or ijekavica, which is why I used the term Ijekavian). Kajkavian (Q838165) dialect is used in and around the capital city and surrounding regions, but any official communication (media, laws, news, ...) is always in Shtokavian (Q148893).
  • Serbian base dialect is Ekavian.
  • Hbs is a language variant that is used for linguistic purposes only, it has no active speakers. Historically, during the 20th century, Serbia, Croatia, Bosnia, Slovenia and other Balkan countries were grouped together in a Socialist Federation of Yugoslavia, which used hbs as its primary language. With the breakup of Yugoslavia in the 1990's, the hbs language split into Croatian (hr), Serbian (sr), Bosnian (bs), Slovenian (sl) and others.
  • Seeing as Wikidata only allows one language per lexeme, some lexemes will likely be duplicated, especially between sr and hbs, and to a lesser extent between hr and sr. These lexemes may only differ in a few forms or even not at all (as is the case with вода/voda (L2068), and вода/voda (L2081)). Ivi104 (talk) 20:10, 12 September 2021 (UTC)Reply
Thanks again for your answers! I'm glad to see that you're adding lexicographical data in your language, and don't want to discourage you from contributing to it, so apologies in advance if the next few questions discomfort you at all:
  • So Croatian is based on Shtokavian, and I remember reading elsewhere that Serbian is also based on Shtokavian. Interestingly, the words for 'world' you mentioned in your initial reply have entries on the English Wiktionary in what it says are their Ekavian, Ijekavian, and Ikavian forms, each under the heading "Serbo-Croatian" (and not separately as "Serbian", "Croatian", and so on). I also see that the English Wikipedia article on "Shtokavian" provides the language codes "sh/hbs", which are different from the Chakavian "ckm" and Kajkavian "kjv". Do you think that English speakers, then, are getting something very wrong?
  • (I've been told by a Macedonian speaker that Slovenian and Macedonian were already clearly separate from what people might call hbs, rather than being part of the language split that came from Yugoslavia's breakup.) Does that mean two people who might have answered 'what language do you speak' with the exact same answer before Yugoslavia's breakup might answer differently after the breakup even if the way they spoke didn't actually change? Mahir256 (talk) 15:53, 13 September 2021 (UTC)Reply
@Mahir256: Thank you for your questions!
  • There is a distinction here that I feel is not that well understood in English: the base dialect names come from their respective word for "what?": A Shtokavian speaker will ask "što?", a Kajkavian speaker will ask "Kaj?" and a Chakavian will ask "Ća?" In that sense, both Croatian and Serbian use Shtokavian as the official norm. The other distinction is how each group handles the Proto-Slavic "jat" sound, let's take the example for "world" once again: Croatian official writing would use "svijet" (ijekavian), Serbian official writing and Croatian Kajkavian would use "svet" (ekavian), and Croatian Chakavian would use "svit" (ikavian). In that sense, Croatian uses Ijekavian and and Serbian uses Ekavian as the official norm. Since Chakavian "ckm" and Kajkavian "kjv" are non-standard dialects, they get their own ISO codes, but since Shtokavian is a standard dialect for the two languages, it does not have its own ISO code and instead uses the code of the language. For an English speaker the differences between Serbian and Croatian may be minor, so they use the sh/hbs macrolanguage label to group the languages together, even though the hbs language itself (for all practical purposes) doesn't exist anymore.
  • Slovenian and Macedonian do not officially belong to the hbs language, only Bosnian, Croatian, Serbian, Montenegrin do. All the languages spoken in Yugoslavia existed as separate entities before Yugoslavia became a thing, then some of them were grouped together into hbs because they were very similar (more similar than say Croatian and Slovenian), and after Yugoslavia fell apart, the languages became separated again. The motto of Yugoslavia was "brotherhood and unity", so from a political standpoint it made sense to unify the languages into one. But to answer your question, yes - even during Yugoslavia, if you'd asked 'what language do you speak', some would have said Serbian or Croatian or w/e, and not hbs. After the separation, those that would have said hbs before would now certainly have answered differently, even if the way they spoke didn't actually change. The hbs language itself is only a grouping, a sort of one-size-fits-all label that includes Bosnian, Croatian, Serbian and Montenegrin, but since the countries exist as separate entities now, nobody (apart from enwiki) uses the hbs label anymore. Ivi104 (talk) 17:33, 13 September 2021 (UTC)Reply
Thank you for your clarifications!
  • So if I understand it better, there appears to be a matrix of nine possible varieties: one out of {Shtokavian, Chakavian, Kajkavian} combined with one out of {Ijekavian, Ekavian, Ikavian}. I can thus see why storing a lexeme as Serbo-Croatian (Q9301) (hbs) would be problematic—it can presuppose an artificial combination of Shtokavian/ckm/kjv despite the limited intelligibility between the three. If you would like hbs lexemes to instead be marked as something else, I would have no problem with that, but see below.
  • You've also stated that both Croatian and Serbian use Shtokavian as its standard norm, albeit with the former using Ijekavian and the latter Ekavian. The English Wikipedia article on the Serbian language lists some dialects that are Ekavian and some that are Ijekavian; not sure if that's a mistake. If a common Ijekavian Shtokavian word like 'svijet' can thus be used in both Croatian and Serbian (and possibly Bosnian and Montenegrin, I don't know), and all of the inflected forms of that word are identical in a given writing system across languages, is it necessarily a bad thing to store such a lexeme as "Ijekavian Shtokavian", rather than duplicating it? (If a word is used only in Croatian and not its neighboring languages, I can support keeping it as "Croatian", and similarly if a word is only used in Serbian I can support keeping it as "Serbian", but for words common to these and the other two languages I wonder if maintaining a single lexeme might be better.)
  • What are your thoughts on the Declaration on the Common Language (Q29018604)? Mahir256 (talk) 18:40, 13 September 2021 (UTC)Reply
@Mahir256: Thank you for your questions!
  • There are in fact less than 9 variations as some variations are not actually used: Shtokavian matches with Ijekavian in Croatia and Ekavian in Serbia, but Shtokavian and Ikavian is not valid; Kajkavian only matches with Ekavian, and Chakavian only matches with Ikavian, so four valid variations altogether. In addition to that, there are also constraints mentioned before: only Croatian uses Chakavian and Kajkavian - Serbian does not use those dialects; and Croatian does not use the Cyrillic script.
  • Both Croatian and Serbian do indeed use Shtokavian as its standard norm, albeit with the former using Ijekavian and the latter Ekavian for official writing, but some dialects do use Ijekavian in Serbia, just like Ekavian is used with Shtokavian and Kajkavian dialects in Croatia - the enwiki article is correct. I'm not sure how Wikidata Lexical policies handle dialects, but physical vocabularies use only the standard dialect, so Ijekavian is handled as exclusively Croatian and Ekavian is handled as exclusively Serbian. Croatian and Serbian Wiktionaries respect this distinction (svijet=Croatian, svet=Serbian) while appropriating common words to their own languages (hr.wt says "razboritost" (prudence)=Croatian, sr.wt says "razboritost"=Serbian, while English Wiktionary merges everything into Serbo-Croatian.
  • Since the languages stem from different grammars (Croatian stems from Ljudevit Gaj's grammar, Serbian stems from Sava Mrkalj's and later Vuk Karadžić's grammars) and use different rules (e.g. Serbian grammar mandates names are to be written as they are pronounced, so George Bush would be Džordž Buš), I would prefer the languages be treated as separate entities. Some noun entries may occasionally be the same (as is the case with other languages, the word svet is also used by Slovak, Slovene, and Swedish, to name a few), verbs will differ in forms (e.g. bit ću razborit in Croatian vs biću razborit / бићу разборит in Serbian [prudent - simple future (Q1475560) indicative (Q682111) first-person singular (Q51929218)], and all Serbian entries will presumably use only Cyrillic script (as is the case on sr.wiktionary) or both Latin and Cyrillic, while Croatian uses only Latin script, so duplication should not be a concern. As for Serbo-Croatian entries, entries in Cyrillic should go under Serbian, entries in Latin under Croatian and I can fix any discrepancies manually.
  • Declaration on the Common Language (Q29018604) is an interesting proposition. The languages are mutually understandable (although most non-Serbian teens and children do not read or write Cyrillic). Many people will refuse to sign because will see this as a political move of Serbian nationalists, a step closer to reforming Yugoslavia where "brotherhood and unity" translated to Serbs having the final word in all the decisions because they held the majority. There is still a lot of hurt and mistrust between our two people due to the Balkan Wars but hopefully that can be amended with time. This appears like some amateur fringe group with supposedly noble intentions, but I think nothing much will come of it, there are propositions like this made every day by someone. Ivi104 (talk) 03:20, 14 September 2021 (UTC)Reply