Help talk:Label

Latest comment: 1 year ago by Infovarius in topic Translation

Help:Label edit

Copied from Wikidata:Project chat.

Shouldn't we have a help page for labels like we do have one for descriptions? This help page would need to contain guidelines on the capitalization of labels. Currently, I do not find an agreed basis for such changes. --Leyo 14:27, 22 November 2012 (UTC)Reply

Some relevant discussions: 1 2 3 4 5. --Yair rand (talk) 14:37, 22 November 2012 (UTC)Reply
Thanks. As far as I see only #4 is on the capitalization.
Would you agree that we need the proposed help page? --Leyo 14:40, 22 November 2012 (UTC)Reply
Yes, that's why I linked to relevant previous discussions, so we have what to build a standard from. Still, the issues could use some further discussion. --Yair rand (talk) 14:44, 22 November 2012 (UTC)Reply
I also support making a help page for labels — however, much like Yair rand, I think some more discussion is in order. —Theopolisme (confess) 18:38, 22 November 2012 (UTC)Reply
Strong support. Wagino 20100516 (talk) 23:32, 22 November 2012 (UTC)Reply
What exactly needs to be discussed more in addition to the capitalization? --Leyo 13:17, 23 November 2012 (UTC)Reply

I think there should definitely be a Help:Label page. If someone creates a proposed guideline for Labels we could then discuss the specific guidelines and publish it when consensus is reached. Things other than capitalisation that need to be explained are:

  • No disambiguation in the Label, leave it to the Description. (e.g. "London, England" should just be "London")
  • What happens if there is no page in your language, use a foreign Label, or translate it? Is no label better than a non-translated one?
  • Labels for varients within one language such as Canadian and British English (though this could be covered by the language fallback).

There may be others but that's all I can think of for now. Delsion23 (talk) 13:24, 23 November 2012 (UTC)Reply

No disambiguation: agree. Translation of labels: from which language? Maybe it's better to leave it blank, since searches will also find the interwikis. Besides: no attributes forming part of the name, like "New York City" or "City of New York"; it should be "New York" as label and "city" as description, I think. (At least in my language, I'm not sure if that's also applicable in every language). Regards. --Dalton2 (talk) 14:47, 23 November 2012 (UTC)Reply
Agree with Delusion — no disambiguation in labels. I think we should go ahead and start working on a draft at Help:Label...when that page is relatively stable and has community approval, we can go about thinking about how to publicize it (and Help:Description, for that matter) - perhaps tooltips/links on Special:CreateItem ... but for now, I agree, writing it is the goal. —Theopolisme 15:26, 23 November 2012 (UTC)Reply
OK, I was bold and drafted Help:Label. --Leyo 16:05, 23 November 2012 (UTC)Reply
  •   Support the #1 and the #3, but I   Oppose to the #2: the use of lowercase in the label. We would have to change all labels for all languages​​. The problem is that it should be done manually. Bots, which have so far added the labels, they have never considered this rule too. Should have been set at the beginning of the project, not now. Now it is useless and harmful in my opinion. How many of you knew that it was necessary use the lowercase for the first letter? How many of you have entered the lowercase? Raoli (talk) 23:04, 23 November 2012 (UTC)Reply
  •   Support #1, #2 and #3. The technical difficulty to achieve #2 arises from the automatic work done by the bot, and it's impossible for it to do it better since it's a task that requires human intelligence. Common names appear in lowercase in dictionaries and encyclopedias, and the case is also a piece of information. For example, it's not the same a planet that something called Planet. I think that technical difficulty is not a solid argument, and neither is aesthetics. It's a matter of patience and dedication, and the result is undoubtedly better than leaving all the labels in uppercase. --Dalton2 (talk) 23:28, 23 November 2012 (UTC) Regarding translations, I prefer to find the more appropiate name for the entity the item represents in my own language, better than directly translating it; and I prefer to leave it blank when one is not hundred per cent sure about it. If there's only one interwiki and it's impossible to find the corresponding name in any other language, then probably it's acceptable to translate it, but in some cases the name is not translatable, so that decision must be taken with care. --Dalton2 (talk) 00:17, 24 November 2012 (UTC)Reply
  •   Support #1, #2 and #3. I agree Dalton. --Rondador 23:35, 23 November 2012 (UTC)Reply
  •   Support #1, #2 and #3. If you can translate the label ok, otherwise better without label. For the initial letter I agree with Dalton, I think it is better to use a system similar to that of wikitionary rather than wikipedia. --Beta16 (talk) 23:52, 23 November 2012 (UTC)Reply
  •   Oppose about the lowercase letters in the beginning of the description, it's a nonsense. In the beginning of the sentence we must write with uppercase letters.   Oppose about desambiguation because there are a lot of pages in Wikipedia with the same name, per example french communes with the same name as Saint-Aubin (more 54), it's a nonsense if we don't keep the desambiguation. We are encountering problems with Gadget-autoEdit.js gadget which is doing the contrary, too.   Support about Scientific versus common names -- Bertrand GRONDIN   → (écrire) 16:03, 10 December 2012 (UTC)Reply
But the disambiguation is needless as it is already disambiguated in the description. What's the point in writing the disambiguation in both fields? The lower case at the beginning of a description will make it easier for it to be machine read. It's easier for a computer to add a capital for the beginning of a sentance than it is for it to make it a lower case in the middle of a sentance (as a computer may not be able to know if it's supposed to be a pronoun or not). Delsion23 (talk) 00:23, 11 December 2012 (UTC)Reply

Colors edit

I think that colors are helpful in distinguishing, but if consensus says not, I don't have problem. —Theopolisme 16:21, 23 November 2012 (UTC)Reply

See Help talk:Description#Colors. I think we should discuss this in one place (here or there) only. --Leyo 16:25, 23 November 2012 (UTC)Reply
Sure, sounds fine. —Theopolisme 16:29, 23 November 2012 (UTC)Reply
I think we should not discuss the use of colors about itself but for its usefulness. The color is just a ploy to help you remember a specific thing or rule (as well as advertising logos). To try to remember just a concept you can then also use a different font family or other expedient. This is what I know about. Raoli (talk) 16:38, 23 November 2012 (UTC) p.s. The colors in the sections of Help:Description try to distinguish the section from the other to ensure that your brain associates color and images to text and MEMORIES it.Reply
I know what the intention was, but I still think it's more distracting than helpful. People from Latin countries tend to like things more colorful. --Leyo 16:47, 23 November 2012 (UTC)Reply
Then it is simple, every tongue shall decide whether to adopt the colors or not. So who wants them puts us the colors and who does not want not. --Raoli (talk) 16:51, 23 November 2012 (UTC)Reply
I second that the colors add nothing but are very distracting. This is also Sven Manguard 18:43, 27 November 2012 (UTC)Reply

Subscripts and similar edit

The correct label for Q61124 would be “C19H21N”. There are surely many similar cases in other subjects as chemistry. Should we add a statement on that? --Leyo 16:43, 23 November 2012 (UTC)Reply

I suppose it's not difficult to include subscript and superscript (anything else?). I think that by now it should be enough with allowing the inclusion of codes like <sup></sup> and <sub></sub>, but in the future there could be a WYSIWYG editor to be more consistent with the spirit of the database. I   Support. Regards. --Dalton2 (talk) 11:10, 24 November 2012 (UTC)Reply
Couldn't we just use the subscript and superscript numbers included in Unicode? See w:en:Unicode subscripts and superscripts#Superscripts and subscripts block. --Yair rand (talk) 08:12, 26 November 2012 (UTC)Reply
I had this idea, too. But I fear that it might cause problems if converted to plain text (with no unicode characters). --Leyo 08:41, 26 November 2012 (UTC)Reply
Every inline text formatting (in Wiki syntax or HTML markup) should be allowed. Sometimes it will also be desirable to include links (but the problem will be on how to resolve those links if the same Wikidata set must be used across several wikis, it may work however in a locally installed Wikidata extension to a wiki. We should not tolerate external links to specific Internet domains, but sometimes it will be helpful to have links pointing to external wikis via pagename prefixes using the wiki syntax of links. For some cases, we'll need to include inlined images, or even TeX math formulas.
So what is the use of Wikidata: we should allow storing in a field any content that is writable in Wiki syntax (including template calls ?), because the purpose of Wikidata is to allow storing a collection of small wiki elements as if they were in the same page for the same data row.
I don't know exactly how Wikidata works, but for me it's just a GUI interface for editing these template subpages (one for each row) from a base page listing all rows in a given data table (acting like a topic selector). However there are differences : the main page (for the topic) can uses a mechanism to infer some default values (when no value is given for a specific key, i.e. a specific column in tabular data), and the main page can also format the data returned by calling the subpage as a template transclusion, performing additional #if's where needed, to use data from another column or by compting it.
A wikidata row should be like a subpage of a main page listing the subpages using template calls, and where each subpage uses a "#switch to return the value matching a key (the name of a column in tabular data). I have alreasdy experimented this since several years, using templates using only the wiki syntax (without using wikidata which did not exist), and various templates in Wikipedias are using such technics to store tabular data which may contain optional columns for optional attributes in a data row. The name of the subpage is like the row id.
Wikimedia already attaches several (keyed) lists of metadata to each page : an history of versions, a list of categories, a list of interwikis, and some other informations. Wikidata is a generalisation of this, allowing performing queries. But it will only be valuable if it can also query the content of a table (or query) by return how many rows it contains, and by allowing performing a loop on its rows and navigating it by limited ranges. Verdy p (talk) 14:00, 26 November 2012 (UTC)Reply

I added H₂O: Just Add Water for the label to have an example. If we disallow unicode characters in the label, we may at least allow them for aliasses. --Leyo 00:17, 4 December 2012 (UTC)Reply

If we disallow unicode in any area, most languanges will be incompatible, so I think it's safe to assume that unicode is fine everywhere, hm? --Yair rand (talk) 00:20, 4 December 2012 (UTC)Reply

No page in English edit

Separating into sections for accessibility. —Theopolisme 17:12, 23 November 2012 (UTC)Reply
What should be done if there is no page in English and there is no (agreed) translation?

I think that we should follow several steps: 1) We ask: are there any sources in English for the name? If the answer is yes, we check if they all agree. If they do, we have the name. If they don't, we choose the name that appears in the most reliable source or sources from the ones available. If there aren't any sources in English, we ask: 2) are there any sources in other languages than the original one for the name? If the answer is yes, we check if they all agree (i.e., if they are literal translations of each other). If they do: we do a literal translation. If they don't agree or there aren't any, 3) we do a literal translation (if possible) or copy the untranslated name into the English field (to be determined by the community). Please read the thread below about references for labels and aliases. Regards. --Dalton2 (talk) 11:27, 24 November 2012 (UTC)Reply
In Spanish, we give preference to linguistic sources, specially the works published by the Association of Spanish Language Academies, and, alternatively, other reliable sources related to language. That's what is written in our manual of style. In other languages, maybe they also follow the institutions which appear in this list. I can see that there is no language regulatory body for the English language. That's bad news. For the rest of the languages, I think that it would be reasonable to allow references for the labels. --Dalton2 (talk) 11:55, 24 November 2012 (UTC)Reply
I think that this section can't be the same for both English and the rest of the languages. The points already added are clearly English centric, and thus they can't just be translated into other languages. For example, in Spanish we follow common use, but that common use is endorsed by the sanction of the Academies when available. Also, if the item is a proper noun that has an article on a Wikipedia from another language using a Latin-derived alphabet, we don't always use that: location names are often translated into Spanish. Points 4 and 5 refer to the level of confidence of the user: we don't rely on confidence, we rely on reliable sources. Confidence might be applied only in the case that there are literally no sources available, and in such case the translation would be temporary until a proper sourced translation appears. Wikidata shouldn't be original research based on users' opinion, not even for the labels. And finally, we don't use 'common use' as what it literally means; it is assumed that common use is refered to academic environment, and not vulgar speech; that's what Academies sanction. --Dalton2 (talk) 23:41, 27 November 2012 (UTC)Reply
(edit conflict) I put a five step process in that takes into account Dalton2's suggestion as step 1. If anyone disagrees, we can tweak it. Sven Manguard Wha? 23:42, 27 November 2012 (UTC)Reply

Disagree. This is not the English Wikidata. This is the multilingual Wikidata. I think the label should be in Unicode, including any accented letters. For items for which there is no generally used transliteration the label should be in the language and script normally used for that item. Remember that this info will be imported into data boxes in lots of languages. Why would these languages have databoxes with a heading in English? Filceolaire (talk) 02:32, 7 January 2013 (UTC)Reply

Every language has their own labels. Unless you've set your language to English, you're not going to ever even see the English label. Because of that, each language can tailor labels to their needs. Sven Manguard Wha? 22:11, 7 January 2013 (UTC)Reply
This policy should be not be translated word by word in every language. But be adapted for each language to tell if what to do if there is not a page in that language. Carsrac (talk) 17:44, 14 February 2013 (UTC)Reply

References edit

One more issue about labels that should be discussed is their scope. Maybe the developers didn't have in mind a rigorous use of labels. But, if the scope is not only internal use, then they also should have references to keep the reliability of the database. And the same should apply to the aliases. That way Wikidata could act as a reliable source for, for example, the titles of the articles in Wikipedia themselves, something that doesn't exist so far, at least in an official way. And, internally, references would also prevent subsequent changes to labels (or aliases) without a solid base to do them. Opinions? --Dalton2 (talk) 18:57, 23 November 2012 (UTC)Reply

  Support I strongly agree with this point. I believe referencing should be a standard policy. --SynConlanger (talk) 20:30, 17 August 2015 (UTC)Reply

Italics edit

There are IMO some labels or parts of them such as binomial names or descriptors in chemistry would need to be written in italics. Examples:

Currently, it does not seem to be technically possible. If we agree that there is a need for italics in labels, we might request for this by opening a bug. Thoughts? --Leyo 09:10, 24 November 2012 (UTC)Reply

Here they talked a bit about italics, but nothing conclusive. I do think that cursive is also information: it's impossible to know if a given word must be written in italics or not if that information is not contained somewhere in the item. I suppose it's not difficult to include italics, so I   Support the proposal. By now it should be enough with allowing the inclusion of codes like '' or <i></i>, but in the future there could be a WYSIWYG editor to be more consistent with the spirit of the database. Regards. --Dalton2 (talk) 11:06, 24 November 2012 (UTC)Reply
I also   Support italics—it makes sense, and, almost more importantly :), sounds like it should be relatively easy to implement. —Theopolisme 15:06, 24 November 2012 (UTC)Reply
I think that it's important information, but it still might make sense not to add it to the label. In phase two, we'll have many opportunities to add all sorts of data, including extra information about names. Perhaps we should add italics as a specific data point, so that we still have plain text in labels. (Also, I strongly suspect having HTML in labels would be really difficult to implement, but we should really ask a dev about that.) --Yair rand (talk) 08:09, 26 November 2012 (UTC)Reply
Plain text can always easily derived from a label, even if it is (partly) in italics. --Leyo 08:39, 26 November 2012 (UTC)Reply
This is not just for italics, a more common problem occurs with superscripts (sometimes subscripts as well) to make a significant difference (think about chemical formulas where BOTH are used, and where the charges using minus and plus should remain in superscripts, otherwise you get another formula for a different compound.
Converting them to plain-text by dropping the HTML formatting may give something wrong.
So we should accept data using some basic wiki markup or inline HTML markup. For some languages this is even the onlyalternative possible, because the markup is used to generate the correct text layout. This will be needed notably for storing translations in this first project.
For other usages of Wikidata (e.g. for attaching metadata to a wikipage, like several transliteration schemes on Wiktionnary, or alternate sort keys for different collation conventions in the same language, where these keys cannot be deduced only by UCA/CLDR tailoring rules, e.g. to index a page in distinct categories according to collation schemes, or for creating tabular data such as population, area in km² or ha, alectoral results, and to allow generating a formatted and paginated or sorted table from wikidata, it will ne less problematic).
Thanks. Verdy p (talk) 13:42, 26 November 2012 (UTC)Reply
You can forget about italics in labels. Note that there is an bug for adding italics and it is marked as WONTFIX - bugzilla:41749. The same goes with any other wiki syntax - bugzilla:41560.--Snaevar (talk) 23:41, 27 November 2012 (UTC)Reply

Revamp edit

I did a revamp today. Any thoughts would be welcome. Sven Manguard Wha? 03:17, 28 November 2012 (UTC)Reply

Very nicely done. Maybe a bit more whitespace than necessary, though. --Yair rand (talk) 10:52, 28 November 2012 (UTC)Reply
@Yair rand, I've taken the liberty to kill some of the white space. —Theopolisme 12:04, 28 November 2012 (UTC)Reply
In general, the guideline looks promising, but there is one small yet significant point which concerns me.

The English Wikipedia is a haven for nationalists and trolls, and hijacking discussions about article titles are high up such users' to-do lists. Rather than mirror any consensus there, or worse yet, give such people scope to continue their arguments on Wikidata, I would prefer that we foster a culture that encourages people to contribute. In general, we should defer to the judgement of the person who took the time to give the previously neglected item a label in the first place, unless the case that another name is more common is compelling. I would therefore suggest that we add words to this effect, and drop the emphasis on deferring to the wisdom of en.wp. —WFC10:58, 28 November 2012 (UTC)Reply

Does this work? This is also Sven Manguard 15:37, 28 November 2012 (UTC)Reply
That's perfect for the lead, and the lead on its own more or less addresses my concern. Perhaps something brief in the body would help, making crystal clear that our primary concern is that the label is not an obscure term (or to phrase it another way, that we aiming for a common name – once we have one, we have little to no interest in debates over which of multiple common names to use)? —WFC19:49, 28 November 2012 (UTC)Reply
Regarding the namespace section, I'm not sure Wikidata will be using other namespaces at all. Wikidata is meant to be improving the article quality, not the project quality. I'd have to ask the Development Team, though. Ypnypn (talk) 14:10, 28 November 2012 (UTC)Reply
See Wikidata:Requests for comment/Inclusion of non-article pages. --Leyo 14:29, 28 November 2012 (UTC)Reply
Having read through that, I'm still not entirely sure what the consensus is. Has a decision been made? This is also Sven Manguard 15:18, 28 November 2012 (UTC)Reply

Layout sample of a Wikidata page edit

Why sample image are given the prefix still use an for description that doesn't comply with this rule? Can it be repaired with the others, please? Wagino 20100516 (talk) 17:49, 28 November 2012 (UTC)Reply

It has an "an" because I fully intend to getting around to having that guideline changed. Could you please point me to the discussion where this was agreed upon, because I don't remember it having consensus, but I may be thinking of something else. This is also Sven Manguard 18:36, 28 November 2012 (UTC)Reply
Yeah, it's proposed policy and seems like there isn't agreement/consensus that decided it, for now. Wagino 20100516 (talk) 19:20, 28 November 2012 (UTC)Reply

General guideline or English only? edit

Does this proposal a general guideline for whole Wikidata or only for English language? If it is going to be general guideline and subject of translation then there are few additional points that wouldn't discussed yet. Plus, as it was mentioned above, the guideline is really English-centric. Thank you. --Zanka (talk) 21:09, 28 November 2012 (UTC)Reply

I was operating under the assumption that each language would decide their own rules. Sven Manguard Wha? 23:55, 28 November 2012 (UTC)Reply

Sunflower and Fireweed capitalised? edit

Don't these two items contradict the capitalisation rules mentioned above them as they are not proper nouns, the same as rabbit? 130.88.141.34 16:05, 29 November 2012 (UTC)Reply

The examples have been fixed by Sven Manguard. --Yair rand (talk) 16:51, 29 November 2012 (UTC)Reply
I meant to come and thank you here, but I seem to have forgotten. I did give a shout out in the edit summary though. This is also Sven Manguard 20:52, 29 November 2012 (UTC)Reply
The second example was recapitalized in 2020 by an editor who keeps changing common names to capitals. I created a discussion on common names at Help talk:Label. UWashPrincipalCataloger (talk) 03:44, 2 January 2022 (UTC)Reply

Multiple common names edit

I think we need a subsection outlining what to do in the case of multiple common names – from my experience on en.wp these situations lead to the most bitter, protracted disputes. In my opinion, the solution is to stick with the first common name, unless the evidence that another name is more common is overwhelming or irrefutable. I strongly feel that we should use language like that in order to discourage frivilous renames/renaming requests. However, given how controversial the area is, I'm hesitant to put my opinion straight into the draft without gauging others' thoughts first. —WFC22:10, 29 November 2012 (UTC)Reply

I know what you mean. The arguments over the naming of Inter Milan (or is it Internazionale??) is a good example of a meaningless fight over titles. A policy whereby the first commonname entered is the one that sticks could possibly help. It could work in the same way it helps on en.wiki with English varients, where if an article is started in American etc. it stays that way unless there is a good reason to change it to another varient (e.g. it's an article about the Queen or something). That's not to say that if someone creates an item called "Passeridae" they can't change it to "Sparrow". Commonname always wins. If there are many commonnames, the first one wins. Delsion23 (talk) 22:24, 29 November 2012 (UTC)Reply
Just to double-check that we're on the same page Delusion, under the sort of policy we're discussing, an item such as Q1075 or Q10676 should retain whatever label was added first? The one exception being where the creator adds a label but then immediately changes it, as appears to be the case with Q1075. Thus, Q1075 should be "color"; Q10676 should be "Mega Drive"? If that is roughly how you envisage it, then I'll add a draft paragraph in soon. —WFC08:06, 3 December 2012 (UTC)Reply
On those particular items I think the issue would mainly be covered by the proposed English varients options (once English is renamed American English which is what it seems to represent unofficially). The Mega Drive would be that in en-gb but Genesis in en-ca and (proposed) en-am. Color would remain so in en-am but be Colour in en-gb and en-ca. I think the issue we are discussing is one that cannot be solved by English varients, i.e. they have 2 or more common names regardless of nationality. Examples would be Burma/Myanmar, Taiwan/Republic of China etc. Delsion23 (talk) 02:19, 5 December 2012 (UTC)Reply
Hello,
and why not considering that if a label is for en: its content should respect the en.wp rules and so on? I mean on fr: we also have many fights about that stupids considerations (transliteration for foreign names, equivalent of color/colour problems, …). We do have rules and recommendations about that, and I guess most WP have such rules, so why not use them? Regards, Hexasoft (talk) 08:30, 26 August 2015 (UTC)Reply

Bot setting labels with brackets edit

User:Yair rand pointed out that my bot is currently creating en-labels with brackets, i know this is not 100% correct but I do not think removing all brackets would be the best. It is currently not possible to distinguish if a label does need brackets. There are many labels which do not need them but for the few where its correct it will be a bigger mistake to remove this part, because it would be nearly impossible to find the later. The other labels would need a description as long as there would be no brackets. But if you would set a description you cold also remove the brackets by hand. For the de_labels MerlBot is crating a list here: Wikidata:Labels_and_descriptions_task_force/de#Bezeichnung_mit_Klammer. Yair rand would rather have no labels at all set then wrong labels. As there is no 100% chance to get everything right, this would mean you would have to do everything manually and not only correcting the mistakes. What do you think? --Sk!d (talk) 23:00, 2 December 2012 (UTC)Reply

If the brackets are kept, we have to spend ages fixing the labels of 99% of them. If the brackets are not kept, we spend much less time fixing the 1% of articles that are the exception. Delsion23 (talk) 00:39, 3 December 2012 (UTC)Reply
I am not sure if this is correct how would you find labels where the brackets should belong to? --Sk!d (talk) 08:23, 3 December 2012 (UTC)Reply
Can I just clarify what the issue is? If it's a decision between an article title with brackets, or no label at all, then while I sympathise with Yair rand, I'd go as far as to say that we must go for the former. English, French, German and Italian speakers might have the manpower to add affected labels manually in a reasonable period of time, but many languages are dependent on the bots. If I search for the Persian language name of an Iranian town, I am much more likely to find what I am looking for with a label in the form of [Persian name (something useless)] than if the label is blank.

If on the other hand it's a decision between keeping all brackets and getting rid of all brackets, my opinion is the same as Delusion23's. —WFC07:48, 3 December 2012 (UTC)Reply

This is currently only about en-labels. --Sk!d (talk) 08:23, 3 December 2012 (UTC)Reply
If we are talking about English only, then on balance I just about agree with Yair rand for the time being. Hopefully someone will come up with a gadget which will eventually allow us to do it Delusion's way, marking entries which should contain brackets as such.

But to reiterate, in case the outcome of this discussion makes its way into some sort of guideline, any consensus here must be for English labels only. If we ignore French, German and Italian, bots probably provide 90% of labels for other languages. It would be wrong to radically change the nature of this support for a hundred or so relatively small languages (in terms of the number of Wikidata editors), based on a four-person conversation on what works best for English. —WFC09:07, 3 December 2012 (UTC)Reply

Ok then after the update i will change my bot to remove brackets for en-labels. --Sk!d (talk) 18:09, 3 December 2012 (UTC)Reply

How to merge two items edit

How to merge different items that have interwiki for different language but for the same object. For example, there are Q3506176 (Fermented milk products) with en, cs and es interwikis and Q4222225 (with no label) with ru and uk interwikis. They correspond to the same product. What should one do in such cases? --Koryakov Yuri (talk) 19:21, 17 February 2013 (UTC)Reply

Image update required — Statements edit

The image on the front is lacking "Statements". It would be good if it could required to add that component.

Capitalization edit

According to Help:Label, "Labels begin with a lowercase letter except for when uppercase is normally required or expected". However, when I use the check sitelink tool to create a new item, all imported labels start with an uppercase letter. See for example Q5054649 or Q5190255. Should I changed them for lowercase? Andreasmperu (talk) 18:49, 23 February 2013 (UTC)Reply

Yes, tool can't know when is needed lowercase letter and for which languages, because every language can have different rules. So you need to change them manually. --Stryn (talk) 18:52, 23 February 2013 (UTC)Reply
Ok, thanks, I will. Andreasmperu (talk) 19:16, 23 February 2013 (UTC)Reply

Labels for non-article items edit

Should we always use uppercase or lowercase in label for the first letter? Wikipedia namespace of course uppercase, but how about templates and categories? --Stryn (talk) 08:51, 28 February 2013 (UTC)Reply

I think the status quo is fine (leaving it capitalized), since "Template:Foo" or "Category:Foo" is what appears in the title of the page, and how it is properly styled on Wikipedia (in English)--the same way that we call this page "Help:Label" and not "help:label". To depart from the default formatting would also create busy-work that is very low priority. Regards, Espeso (talk) 15:20, 28 February 2013 (UTC)Reply
Yes, these are names, so they should retain the capitals - the same as they would within a sentence. --Avenue (talk) 18:09, 28 February 2013 (UTC)Reply

Labels for one topic split over multiple articles edit

Wondering what to do in the case of Q6564355, Q6564357 and Q6564358 -- these all cover the same topic, but have been split up on enwiki in order to keep the lists to a manageable size. Should the labels therefore include the disambiguation contained within the parentheses, or should that be left to the description? Or should the label be "list of Bolton Wanderers F.C. players with fewer than 25 appearances"? Buttons to Push Buttons (talk) 21:16, 10 March 2013 (UTC)Reply

Parentheses in non-article or disambiguation page. edit

For label of Q395864, "300 (disambiguation)" would be better than "300". It is easy to distinguish from normal items. It could be applied for non-article pages. The label of Q4663321 is "Wikipedia:Notability". But "Wikipedia:Notability (people)" is more helpful. -- ChongDae (talk) 05:30, 28 March 2013 (UTC)Reply

For the former, the "(disambiguation)" part is not needed. It is already disambiguated from other items called "300" by its description. In the latter case, I agree that the "(people)" should be included as that is part of the full name of the guideline. Delsion23 (talk) 14:15, 29 March 2013 (UTC)Reply
Yes, both are as you said. --Stryn (talk) 15:22, 29 March 2013 (UTC)Reply
I think that Wikipedia disambiguation pages should be explicitly mentioned in this help page. Do they fit into the "Non-article items" section? They are already mentioned here: Help:Description#Non-article items. --Pabouk (talk) 01:20, 22 June 2013 (UTC)Reply

Translate geographical names? edit

While countries and big cities mostly have English names that are widely known, I found translations where I'm not sure if they really make sense. And then, the question is also related to company names: There is a railway company in Switzerland, deserving two valleys and its name is Wynental- und Suhrentalbahn. The article in the English Wikipedia can be found under Wynental and Suhrental railway. But the Wikidata label is Wyna Valley and Suhre Valley Railway. Shouldn't the label be the same as the English article or even the original company name?-- Gürbetaler (talk) 22:17, 6 April 2013 (UTC)Reply

Season in parenthesis? edit

En-wiki has an article called en:My Three Sons (season 9). Should it be here labelled as "My Three Sons (season 9)" or "My Three Sons", and "season 9" is the description? I've seen that some users includes "season X" on label, and some not. So how? --Stryn (talk) 20:01, 8 June 2013 (UTC)Reply

(season 9) is really part of the item name and should be in the label. If it was a disambiguation like (TV series) or (film) it should not be in the label. HenkvD (talk) 17:32, 23 June 2013 (UTC)Reply

Non-article items edit

We have to change text on Wikidata:Label#Non-article_items, because now there is also Wikivoyage links, which have been linked with Wikipedia links. So two namespaces (Wikipedia, Wikivoyage) are in the same item. --Stryn (talk) 13:16, 24 July 2013 (UTC)Reply

Disambiguation edit

I would add a specific example for a desambiguation page in the subsection Examples of Wikidata:L#Disambiguation, e.g.:

Wikipedia article: Rice (disambiguation)
Wikidata label: Rice
Wikidata description: Wikipedia disambiguation page

need of first character capitalization property / flag edit

Hi! I changed hundreds of first characters from uppercase to lowercase mainly in Esperanto, Romanian, English. Because lack of time I did not changed these in Cyrillic scripts. I thing that one should have a property with values uppercase, lowercase, language dependent. I assume that these property / flag fields can be set automatically:

  • disambiguation pages start always with uppercase letters
  • pages from WMF projects which are using another namespaces then (Main) will start always with uppercase letters
  • humans, places will start all with uppercase letters
  • professions will start all with LOWERCASE letters
    sample query: http://208.80.153.172/api?q=claim[31:28640]
  • basic concepts will start all with LOWERCASE letters
    • special concepts may start with UPPERCASE letters

Exception handling:

  • Polish will start with an uppercase letter in enwiki but with start with a lowercase letter in most other languages. So an exception list with language codes is required here. I am not sure about Latin.

If a KISS model / implementation can be added to Wikidata it would save a lot of time and help a lot of smaller languages communities. Regards לערי ריינהארט (talk) 13:29, 12 March 2014 (UTC)Reply

Updates as part of documentation overhaul edit

Hi all, I recently made substantial edits to Help:Label as part of a larger sitewide documentation overhaul (more info on this here).

To compare my edits with the previous version please see the diffs here

Major changes include the following:

  • removed section on Lists for Wikipedia articles - I didn't think this was so complicated a case to merit its own section; it's also my understanding that it only applies to Wikipedia (i.e. the other Wikimedia sites don't have "list of" articles) so in the interest of balance I removed it
  • removed old screenshot - I have replaced it with a newer one (knowing full well that this too will soon need to be replaced as per the UI redesign)
  • updated content so it now refers to Wikimedia sites more generally vs. just Wikipedia articles which was the case before
  • moved around content and update section headings to be similar to that of the Help:Description page (i.e. first cover general principles and then language-specific guidelines)
  • changed references to "entity" and "entities" to "item" and "items" - I think this is less confusing for newcomers and in any case the guidelines are more intended for items (not properties)
  • added example of using unicode

Please let me know if you have any concerns about these changes or suggestions on further improving the documentation. Here are issues I would also like specific feedback on:

  • What needs to be done to get this page from proposed to accepted guideline like Help:Description?
  • are there too few screenshots? Is the one in there helpful?
  • Should this page include information on all entities rather than just items (i.e. info on properties and the soon-to-be-added queries)?

Thanks. -Thepwnco (talk) 22:08, 18 June 2014 (UTC)Reply

In my opinion, the content of the page focuses far too much on trying to make Wikipedians familiar with Wikidata. I would shape the content more generic and remove all the "compared to Wikipedia"-like statements. That would basically mean a rewrite of nearly all content. Maybe, there could be a single section for all such Wikipedia comparisons if these are really necessary (the section Items without pages on Wikimedia sites probably needs to remain somehow in any way).
Another observation is the the glossary: "Subject" as well as "entry" should probably be replaced by "Wikidata item"/"item".
Screenshots: Just like I proposed for descriptions, I could think of a screenshot of search results (or search box suggestions) that visualize that labels do not need to be unique. Screenshots are especially helpful for technically less versatile users getting to know the project - why not have a very simple screenshot of the top of an item page with the image description that the label is the big bold text on top of the item page - something like that. There is already the screenshot of the empty label but that is likely not the first visualization of a label new users will see when visiting Wikidata.
(I did more or less just scan the page since I think, a more or less fundamental rewrite would be appropriate anyway.) Random knowledge donator (talk) 15:53, 25 June 2014 (UTC)Reply
Thanks for your feedback and suggestions about screenshots. Can you specify if your comment on Wikipedia concerns just the Help:Label page or all of the Help pages? As mentioned above, my recent updates to the Help:Label page attempted to reduce the focus on Wikipedia by making reference to all Wikimedia sites. I can't find any phrases with "compared to Wikipedia" so please be more specific about what you think is confusing or could be improved upon. I also think that still some mention of Wikimedia sites is necessary, especially for providing information on labels (which often are derived to some extent from Wikipedia article or other Wikimedia sites page titles as per the notability criteria). Thepwnco (talk) 20:08, 25 June 2014 (UTC)Reply

Political entity labels edit

I have opened a topic on applying Help:Label to the specifics of administrative entities at Wikidata_talk:Political_geography_task_force#Political_entity_labels. I looked around and couldn't see where this specifically had been discussed, but if such a discussion exists or I should move the discussion here instead of where I put it, please let me know. Please take a look and provide any commentary and insight you might have on it. Thanks! Joshbaumgartner (talk) 07:04, 9 July 2014 (UTC)Reply

Capitalization of class names edit

Wikipedia applies the capitalization rules of the Chicago Manual of Style for class names if they follow the base name.

In edition 16 §8.50 Political divisions—capitalization [1] it says:

Jiangxi Province. 
Massachusetts Bay Colony; the colony at Massachusetts Bay.
New York City; the city of New York

In edition 16 §8.52 Mountains, rivers, and the like [2] it says:

Names of mountains, rivers, oceans, islands, and so forth are capitalized. 
The generic term (mountain, etc.) is also capitalized when used as part of the name.

I suggest this should be applied for English labels in Wikidata. There are other manuals of style that have other policies, but in one database one would normally one system only. Tamawashi (talk) 17:32, 9 July 2014 (UTC)Reply

  Support Since disambiguation words should not be included in the label at all, this should not be too much of a problem. For the most part, if an imported label from en:wiki has a non-capitalized disambiguation word included in it, that should simply be removed and put in the description (see the M1 chemical mine). Where common usage of the proper name includes the 'generic' word in general use (not just to differentiate from similarly named entities), it should be retained (e.g. Salt Lake City), but if the 'generic' word is not universally used or only used for disambiguation, it should not be in the label (e.g. Rhine is correct as opposed to "Rhine River" or "Rhine river".) Joshbaumgartner (talk) 08:05, 10 July 2014 (UTC)Reply
@Joshbaumgartner: - At least on capitalization we seem to agree. For class name inclusion I made a proposal below. Tamawashi (talk) 12:12, 10 July 2014 (UTC)Reply
@Tamawashi:, I hope to agree on more than merely this. Joshbaumgartner (talk) 23:08, 12 July 2014 (UTC)Reply

Regarding the scope of the new rule: CMoS 8.50 and 8.52 do not include all physical objects, not even all geographical objects, e.g. dog breeds and streets are not included. I would like to apply the rule as broad as possible. AFAICS dog breeds and streets currently are capitalized. Tamawashi (talk) 09:35, 13 July 2014 (UTC)Reply

Inclusion of class names edit

@Joshbaumgartner: If an item that contains a classname (e.g. Black River, or Washington County) is an instance of an item that is named "<classname>" (e.g. river) or "<classname> of <territorial entity>" (e.g. county of the United States) then the classname is candidate for removal. Bots could help in monitoring already. But there seem to be no bot-checkable rules yet for when it should be removed and when not. Classes could be included in a "list of items having bot enforceable class name inclusion" and marked as "include" or "remove", a non-controversial class seems to be U.S. state (Q35657) where no instance should be labeled "Foo State", i.e. they would be labeled "remove". This will not work for rivers, if "Rhine" and "Black River" shall co-exist, since they both are instance of the same class. For administrative territorial entities it could work fine. I started Wikidata:Administrative territorial entity and proposed a listing that helps in monitoring the classes including some label monitoring: Wikidata talk:Administrative territorial entity/List of subclasses. Ideally all of them that have instances and not only subclasses will get a listed in "list of items having bot enforceable class name inclusion". That avoids manual revert wars on individual items. Tamawashi (talk) 12:12, 10 July 2014 (UTC)Reply

@Tamawashi:, I think that working towards a guideline of how label names should be constructed for different geographic entities is a good thing, though I'm not sure it warrants a new discussion since there are several discussions on this already brewing, including Wikidata_talk:Political_geography_task_force#Political_entity_labels. Creating lots of new pages and topics all over the place will make it hard to bring together voices to collaborate on the work. Joshbaumgartner (talk) 23:45, 12 July 2014 (UTC)Reply
@Joshbaumgartner: Avoiding duplication can save time, and it seems we both would like that. If the discussion for the new rule/guideline/policy takes place in a page that is dedicated to all items that the rule shall be applied to, then it is more likely that interested people find it and the likelihood of duplication is reduced. Since Help_talk:Label is dedicated to all items (Wikidata:Glossary#Item), it looks like a candidate, but the might be a lower level page that is dedicated to all these items. The political geography TF page seems to be dedicated to a subclass of items that does not include all items the new rule should be applied to. I don't know how "political entity" is defined, would it include all proclaimed territorial entities, e.g. en:irrigation districts ? Tamawashi (talk) 09:27, 13 July 2014 (UTC)Reply

Languages labels edit

See discussion. --SynConlanger (talk) 21:08, 17 August 2015 (UTC)Reply

Title case edit

"Use the item's most common name, and only capitalize proper nouns"

What about the title case typography often found on movies/books/articles titles in English? I often see users remove the capital letters arguing that common nouns should not be capitalized. Like "The Empire Strikes Back" becomes "The empire strikes back" or "Back to the Future" becomes "Back to the future". Thibaut120094 (talk) 02:01, 18 August 2015 (UTC)Reply

Oh, just saw Help:Label#Capitalization, that answer my question then. Thibaut120094 (talk) 02:12, 18 August 2015 (UTC)Reply

Bad Example in the present help page edit

"The label of Helianthus annuus (Q171497) is the common name, while the scientific name (Helianthus annuus) is featured as an alias." Huh? - Jmabel (talk) 00:11, 9 December 2015 (UTC)Reply

How discretionary, unreferenced labels violate a NPOV edit

The statement "In fact there are several cases, discussed below, in which it is actually desirable for the Wikidata label to be different from the Wikimedia page title" is inconsistent with a NPOV. Importantly, a "common usage" definition for a "discretionary" preferred label is not spatially or temporally verifiable. If not derived from the Wikimedia page title or a "semantically equivalent" referenced source, the label contains a localized bias imposed by the author on the entity concept.

Further, the statement "When it comes to scientific names, for example, of a species, labels should use a species' common name, however items must always also have the scientific name listed as Alias. If a species has several common names, a reasonable effort should be made to determine which of them is the most commonly used, e.g. by consulting references." seems to be a reasonable guideline. However, most of the time, there are multiple answers to the research problem of "most commonly used", particularly when evaluating an entity concept label in a temporal or spatial context. Why then is this rather important judgment of "preferred common usage" left to editor discretion? This dependency on judgment introduces a bias that violates the NPOV and if "edit wars" have been spawned from this possibility, there seems a need to fix the problem. In a scientific context, the "preferred label" is the scientific one. In common usage context, the "preferred label" is the common one. They are both right. This is a obvious polycontexture and needs to be semantically expressible in Wikidata.

It is my belief that the preferred label should either represent the Wikipedia page title as it exists, or an extended language property like dcterms:language should be added to provide in what "preferred" language context the entity is known by this label (<entity> skos:prefLabel "Helianthus annuus"@en; dcterms:language <http://example.org/zxx-x-taxon>). (Or alternatively, the data model should support preferred label, alias and description references.) Furthermore, a "hidden label" alias should be added (skos:hiddenLabel) that provides an opportunity to provide externally-sourced perhaps politically or grammatically "incorrect" alternative labels. Chjohnson39 (talk) 14:12, 20 March 2016 (UTC)Reply

All accepted languages edit

Somewhere around there should be shown the full list of accepted languages for labels (with their codes). --XXN, 11:36, 23 June 2017 (UTC)Reply

Labels for items which are sub-pages on Wikivoyage edit

I see that items taken from Wikivoyage sub-pages have the full Wikivoyage Pagename/Subpagename as a label. (I have only checked a few, so this may not always be true) My understanding of the general naming policy is that this is probably due to them having been harvested by a bot, and not yet improved. Am I correct in assuming that these should be changed to labels which reflect actual usage, i.e without the main article name in front, and the full Pagename/Subpagename should be given as an alias? The description will generally be adequate disambiguation in the case of dive sites, which may share a name with another site in another region, but when the location is specified, they are uniquely identified. Pbsouthwood (talk) 13:14, 7 October 2017 (UTC)Reply

I have started to edit dive sites of the Cape Peninsula and False Bay following this principle (there are about 250 of them), See Q15266796 for the first. I am also adding country, instance of, geolocation, and where it exists, the detail bathymetric chart om Commons. If there are other properties anyone can recommend for dive sites, please let me know. Pbsouthwood (talk) 05:15, 8 October 2017 (UTC)Reply

To-do list for guideline status? edit

Is there a to-do list, for what we need to do / decide / discuss, to get this page confirmed as an agreed upon guideline? Quiddity (talk) 18:55, 9 October 2017 (UTC)Reply

@Quiddity: Unfortunately I have not seen much movement on that. It would be good to at least come up with a breakdown of what parts can be agreed to and which parts remain under discussion. Josh Baumgartner (talk) 01:48, 19 January 2021 (UTC)Reply

Plural/singular edit

I propose to add a new general rule for all languages: label has to be in the singular form if the concept can be used in the singular form. This is a general recommendation for ontology building. Snipre (talk) 20:42, 3 January 2018 (UTC)Reply

Ex.: tree (Q10884)

What about Olympic symbols (Q381360)?--Ssola (talk) 20:34, 15 January 2018 (UTC)Reply
@Ssola: In my opinion, that items label should be "The Olympic Rings" and in that case, since the symbol is made up of multiple rings, it cannot be used in a singular form, which would not violate the rule/guideline. U+1F360 (talk) 05:44, 19 August 2018 (UTC)Reply
JFYI, this item is not only about rings... --Infovarius (talk) 21:26, 17 March 2019 (UTC)Reply
@Snipre: I think this should only apply to classes. Occasionally we have non-classes that are plurals (usually prefixed with "the" in common speech, but not in the labels), but technically can be used in the singular, even if that use is quite rare. --Yair rand (talk) 21:00, 12 March 2019 (UTC)Reply
Come to think of it, this would also make it possible to tell how we're treating things like Anglic (Q1346342) (whether they're a class or not). --Yair rand (talk) 00:44, 14 March 2019 (UTC)Reply
Somewhat related: Should there also be a rule requiring labels to be in noun form? --Yair rand (talk) 05:00, 19 March 2019 (UTC)Reply

HTML formatting for labels edit

I saw a discussion on HTML formatting (section #Italics) has been initiated in 2012. In bugzilla:41749, the development team said they will not consider implementation of such formatting and daniel (Duesentrieb (talkcontribslogs)) suggested to "make a page somewhere and collect the use cases, so we can discuss what to do about them". I just contacted him on his WP:de user page, but he seems to be no longer active there. I thus copy the message I wrote:

In the French Wikipedia, we have a template fr:Modèle:Bibliographie which is equivalent to the English one en:Template:Cite Q. It permits calling Wikidata items relative to books and articles, instead of filling en:Template:Cite book and en:Template:Cite journal by hand!
It is incredibly convenient, as Wikidata now possesses items for millions of scientific articles (most of the database growth in last months was because of those guys). Titles of these articles, though, regularly includes italics. For instance, the "title" property of the Wikidata item Evidence suggesting that Homo neanderthalensis contributed the H2 MAPT haplotype to Homo sapiens (Q33988883) should render "Evidence suggesting that Homo neanderthalensis contributed the H2 MAPT haplotype to Homo sapiens", as you can see on the publisher page. Hence, it is a great issue that Wikidata cannot handle such basic HTML tags as <i>, <sup> or <sub> (the latter two are essential for chemists, in particular!).
You may know the Wikidata Concepts Monitor showed that items about scientific articles were the least used on Wikipedias when compared to their abundance. I doubt it will change soon if so many outputs includes typographic bugs that cannot be corrected... I hope the development team will consider this flaw, or at least will not reject it for good!!

Does anyone know where I could post this remark for someone to takie it into account? Thanks for your help. Cheers, Totodu74 (talk) 16:25, 17 January 2018 (UTC)Reply

  • HTML tags aren't essential. What might be essential is formatting. If we introduce formatting we have to choose between the different semantics. In addition to HTML there's also Wikitext and markdown. I personally would prefer markdown. ChristianKl18:29, 17 January 2018 (UTC)Reply
    Hi ChristianKl, thanks for your answer. I am very interested in your alternatives. Could you exemplify how to correctly type "Evidence suggesting that Homo neanderthalensis contributed the H2 MAPT haplotype to Homo sapiens" on Evidence suggesting that Homo neanderthalensis contributed the H2 MAPT haplotype to Homo sapiens (Q33988883)? Totodu74 (talk) 19:18, 17 January 2018 (UTC)Reply
    Currently, neither of those ways is integrated but if we integrate one way, we will have to decide for one way of formatting and I don't think html is the best. ChristianKl20:03, 17 January 2018 (UTC)Reply
    It's been a while but using unicode italics seems to do the job: "Evidence suggesting that 𝐻𝑜𝑚𝑜 𝑛𝑒𝑎𝑛𝑑𝑒𝑟𝑡ℎ𝑎𝑙𝑒𝑛𝑠𝑖𝑠 contributed the H2 𝑀𝐴𝑃𝑇 haplotype to 𝐻𝑜𝑚𝑜 𝑠𝑎𝑝𝑖𝑒𝑛𝑠" works well and is easily rendered by everybody corrently. ChristianKl20:42, 10 December 2020 (UTC)Reply
    No, no, no! Do not use mathematical characters for type style. This completely breaks accessibility. If you don’t use a screen reader, for example, you can check this if you have a Mac. Select the text in your web browser, select Edit > Speech > Start Speaking. For me, the voice reads “evidence suggesting that mathematical italic capital aitch, mathematical italic small oh, mathematical italic small em, mathematical italic small o, . . .” Takes forever. Michael Z. 21:20, 10 December 2020 (UTC)Reply
    Also, this data can’t be searched.
    Also, this text will become illegible in contexts that don’t support this Unicode range or lack supporting fonts.
    Also, it will may unexpectedly fail validation. Michael Z. 01:29, 11 December 2020 (UTC)Reply
    Assessibility is an implementation issue of the screen reader. These days most contexts support those Unicode blocks.
Well implemented search normalizes characters. The top search result for 𝐻𝑜𝑚𝑜 𝑛𝑒𝑎𝑛𝑑𝑒𝑟𝑡ℎ𝑎𝑙𝑒𝑛𝑠𝑖𝑠 is Neanderthal (Q40171). ChristianKl11:56, 11 December 2020 (UTC)Reply
Empty aphorisms. These characters are not normalized as text—because they are math symbols, not intended to be normalized as text. We can isolate this effect by searching for page title: no match. Of course, what goes into a database label will get similarly isolated when reused in other contexts. You may as well use Cyrillic characters, Greek, L33T speak, or an animated gif. Michael Z. 15:52, 11 December 2020 (UTC)Reply
Also, that argument boils down functionally to “screen readers should do what I want them to do, so screw the disabled.” Michael Z. 15:58, 11 December 2020 (UTC)Reply
I just tried reading the page in Apple’s VoiceOver screen reader. It reads the title as “Evidence suggesting that contributed the H2 haplotype to,” ignoring all the “italics.” This kludge fails in one of the most widely available screen readers. If you have Windows, I’d like to hear what MS Narrator makes of it. Michael Z. 16:12, 11 December 2020 (UTC)Reply
Here’s a blog post with video demo, written by a web professional, of the (nonexistent) usability of such “text” in some screen readers. And another, showing how transclusion can break it further—in this case a Tweet embedded in Wordpress, but the principle of unpredictability directly applies to WikiData. —Michael Z. 17:23, 11 December 2020 (UTC)Reply

Hatnote for redirect Lexicographical data edit

Hello, do we need a hatnote on Help:Label WD:L redirects here, for Lexicographical data, please see Wikidata:Lexicographical data? Thanks. --Titodutta (talk) 21:09, 17 June 2018 (UTC)Reply

Length Limit edit

What is the exact limit for the length (number of characters or words)?  – The preceding unsigned comment was added by Mahdimoqri (talk • contribs) at 17. 11. 2018, 21:33‎ (UTC).

I've just attempted to add a very long label. The error message said: "Label must be no more than 250 characters long". Matěj Suchánek (talk) 10:41, 25 November 2018 (UTC)Reply

Disambiguation of property labels edit

User:Jura1 and I are disagreeing on the English-language disambiguation of property labels (which are not mentioned in Help:Label as they have correctly pointed out). If you have a strong opinion on the matter, please contribute to the discussion at Property talk:P454#Label disambiguation. Thank you! — OwenBlacker (talk; please {{ping}} me in replies) 21:09, 28 December 2018 (UTC)Reply

Category and template labels edit

In this discussion, Wikidata administaror ChristianKl encourages renaming category labels in English despite their names in the English Wikipedia. A case in point, English Wikipedia's Category:Czech folk music groups was renamed to Category:Czech contemporary folk music groups in Wikidata by ŠJů. ChristianKl fully supports ŠJů, saying: "There's nothing arbitary about finding names on Wikidata that are intend to reflect what a category is about instead of names that match external names 1-to-1."

However, that's contrary to what Help:Label says: "But note that for other client wiki namespaces like "Category" and "Template" the labels should be identical to sitelinks (the disambiguation parts shouldn't be removed), as these types of pages usually have only one common type of descriptions and there may occur API errors of non-unique pair consisting of label + description when trying to set descriptions to other items."

I ask to give an expert assessment on this case. Thanks.--Russian Rocky (talk) 11:28, 1 October 2019 (UTC)Reply

  • To make the case here, the label&description exist to allow Wikidata users to understand what an item is about. In the case of Category:Czech folk music groups, the fact that the category as it's used in EnWiki is about contemporary folk music in meaningful information that's valuable to communicate to Wikidata users. As we have default descriptions for category items it makes sense to store that information in the label. ChristianKl15:25, 1 October 2019 (UTC)Reply
    • How do you define when information is meaningful? Do you have a set of criteria, which you can share with us? It's a Wikipedia category to begin with. If you believe the new name is more meaningful for English speaking users, then why don't you nominate this category for renaming in the English Wikipedia and convince people there? It'd be much better, because I don't think this stealthy renaming practice towards categories is really the right choice.--Russian Rocky (talk) 16:22, 1 October 2019 (UTC)Reply
  • There can be (and I met such cases) when there is category item withoug English label which can be translated by the same English label, but the items cannot be merged because of some other language. Then the label of the first item should be adjusted even it deflects from the original English title. --Infovarius (talk) 15:23, 4 October 2019 (UTC)Reply

Review of edit

At Wikidata:Project_chat#Strange_book_entries, there is a proposal to normalize English labels for labels like the above to "review of "+title of work. Please comment there. --- Jura 09:20, 8 October 2019 (UTC)Reply

The referenced discussion never really got off the ground and got archived, but I like this idea. Calliopejen1 (talk) 07:09, 26 February 2020 (UTC)Reply
  • Support The discussion has been archived at Wikidata:Project chat/Archive/2019/10#Strange book entries. Publishers use a variety of reviewed book information as the title that I think should be appropriately included as the title (P1476). However, this is an unusual case where title (P1476) is not the common name. In discussion, I actually think "[REVIEW AUTHOR]'s review of [BOOK TITLE WITHOUT SUBTITLE]" would be the clearest common way of referring to the item. However, I think "review of [BOOK TITLE WITH SUBTITLE]" is adequate. "review of [BOOK TITLE WITHOUT SUBTITLE] in [PERIODICAL NAME]" is also an option. Perhaps these could all be included as aliases. Daask (talk) 20:19, 26 March 2020 (UTC)Reply

Dual labels edit

Is it acceptable to use multiple names within a label, or should these always be included as aliases? For example Kyiv (Q1899) (Kyiv/Kiev, unclear which spelling is common) Forest (Q72946) (Forest/Vorst, Dutch name is very common, even in English). I'm inclined to explicitly disallow this in Help talk:Label. What do others think? Daask (talk) 21:59, 25 March 2020 (UTC)Reply

"The label is the most common name that the item would be known by" it seems to me "Kyiv/Kiev" is clearly not the most common name the city is known for. Why do you think this needs to be made explicit? ChristianKl08:33, 26 March 2020 (UTC)Reply
@Daask: No, the label should be a single name - pick one, and put any others in as aliases. ArthurPSmith (talk) 17:32, 26 March 2020 (UTC)Reply
@ChristianKl: Somehow I managed to breeze right past that description when I skimmed the page for material relevant to this issue. I still think it would be better to explicitly disallow dual names, even when there is no single "most common name". Daask (talk) 20:57, 26 March 2020 (UTC)Reply
Then we end up with naming wars. There's other use cases for multiple labelling - given names. They are defined in one script (in Wikidata) and so they usually have several transliterations to other scripts. And frequently there are no single "most common name". And it is impossible to designate single variant as this would contradict to some fraction of uses. --Infovarius (talk) 14:04, 27 March 2020 (UTC)Reply
The fact that there are multiple names in use doesn't prevent some name from being the most common one. I generally think there might be case to improve on the existing rule. I think there are many cases where it would be good to have consistent naming inside of Wikidata. I for example think that we should use Latin names over Greek for all anatomical structures even when there are a few where the Greek name gets used more frequently then the Latin name. ChristianKl15:12, 28 March 2020 (UTC)Reply

Translating labels based on popularity edit

Hello. I'm wondering what the recommendations are for translating labels. I see there is the popularity rule when it comes to choosing between two possible labels of the same language, but that seems inadequate when going from one language to another. Where a term in a certain language will inherently be more popular then in another, but then a label for a term in one language shouldn't be in the label for another. For instance not having a Bosnian translation in the label for potato, or using the English one instead of krompir, because it inherently has millions less Google results then English doesn't seem right. Also, sometimes things are translated based purely off translation services. Which don't account for popularity. Also, a few users have said that we are suppose to provide a source for the translation so it can be verified later, but I see no way do that since labels do not have a box to enter references and you can't leave changeset comments. Plus, you won't have a source if the label was translated using a translation service anyway. --Adamant1 (talk) 22:36, 19 April 2020 (UTC)Reply

@Adamant1: The rule is based on how common the term is in text written in the language in question. In language X (say "Bosnian") it is irrelevant how a particular concept Y (say "potato") is written in any other language, all that matters is how Y is referred to in texts written in language X. Google results not filtered by language are also irrelevant in such an assessment. As you point out, Wikidata does not provide a mechanism to add a reference to labels, descriptions and aliases as such; if you believe a particular value is correct and somebody else questions it, then you can provide that reference elsewhere, for example on the item talk page, or as a reference on a relevant property statement (such as official name (P1448), title (P1476), etc.) ArthurPSmith (talk) 15:09, 20 April 2020 (UTC)Reply
First, it's not a rule. I'm sick of people treating like it is. If you think a proposal should be followed, cool follow it then, but you can't force everyone else to or use it as a revert excuse. Second you said it's based on popularity. So, popularity is just confined to usage within a certain language then? If so, how come a lot of the translations are written in language Y instead of language X, without accounting at all for how much it's used in language X? I'm just not seeing how your claim fits reality. Refer to Mathew hk's claim that Cooperative Credit Bank of Rome was wrong because only the Vatican uses it (when it's their company) and that we should go with the Italian instead, because that's what the Economist goes with.
Also, fine, it's not about Google hits. Mathew hk reverted me based on Google Ngram results and you've repeatedly used "popularity" as a reason to revert me though. So say I 100% buy your premise. Then how do I determine popularity if it's not either of those? Mentions in a Magazine? What? You can repeatedly say to use popularity, but it's extremely unhelpful if you can't say how exactly how we do that. Or is it just some relativistic thing?
Yeah, cool, lovely, provide a source when someone asks you. I get it. What about situations where the label was changed a few weeks ago (or whenever) and the person just doesn't have the source anymore, but there was one? Or what about when the translation is from a translation service and there is no source? Do we just delete or change the label in those situations based on personal opinions? Also, why would the answer to lack of sourcing be replacing the label with one from another language? We don't replace un-sourced birthdays (or whatever) with someone else's birthday. We don't usually delete them either. There's un-sourced stuff all over the place without issue. Even in none English labels, or they are just left blank if there's no good or "popular" translation. Why are English labels treated differently or why is it OK to intentionally introduce bad data into a label based on a personal opinion that the current form of the label is wrong? Also, note that your whole premise that people should provide the sources when someone asks is completely BS because both you and Mathaw hk reverted me either before asking or didn't even ask in the first place. So, I can't provide a source when I'm not asked and I'm sure is hell not going to after you've already aggressively reverted all the labels I edited. There is no "revert now because you think it's wrong and then ask later, or just don't and continue reverting the person" rule. Seriously. --Adamant1 (talk) 02:47, 21 April 2020 (UTC)Reply
BTW, It's extremely flawed to say the popular English term for an Italian company like Crédito Agrícola is the Italian spelling just because the Economist calls it that. There is no "popular" way to say it in English, because it's not an English company and people in most of the English speaking world don't call it anything. The Economist doesn't determine "commonality." Neither does Bloomberg BTW, because they just aren't common companies or concepts in English. Which is why I think it should be translated into English and called a day. The Italian is still available in Italian and comes up in searches. So it has zero effect on anything. --Adamant1 (talk) 02:46, 21 April 2020 (UTC)Reply
@Adamant1: A few points: (1) "Common" and "popular" are not quite the same thing. This page says labels should "reflect common usage". All text is written for an audience, so "common usage" in my view simply means finding sources that write (in the given language) for the general public if possible, rather a specialist or otherwise restricted audience. (2) Proper nouns and common nouns represent quite different entities; for common nouns, certainly one would expect a literal translation (of meaning, not necessarily of the individual words) to be the most appropriate label in a language in almost every situation. However for a proper noun, that is the name by which that entity is called. Names are special; both people and organizations have preferences in how they like to be called by others. That is often expressed by an "official" name in a given language. You've brought up WHO as an example - their website explains that WHO operates in 6 official languages and if you go to other language versions, for example Russian, you'll see the official name in that language, abbreviated ВОЗ. Many Chinese and Japanese institutions have an "official" English version of their name, helpful for communicating in Western countries, for example I was just yesterday looking at Xichang University, which you can see from the top of their website has both an official Chinese name (西昌学院) and an English one (Xichang University). If I just used "Google Translate" on the Chinese name I would think the literal translated name was "Xichang College", but they clearly have a preference for "University" in English, so that is the name that should be used. Using a phonetic transcription of the Chinese name instead for English would produce something like "Xīchāng xuéyuàn" - that may be useful for some purposes but it's probably not what a common text would use to describe them. So I guess that's my final point - in my view (3) you should not be relying on automated translations or transcriptions unless there's nothing else available. First look to see what the person or organization prefers to be called; if there's no clear preference in your language or a close relative, look for reputable sources writing about the person or organization for the general public. ArthurPSmith (talk) 12:51, 21 April 2020 (UTC)Reply
@ArthurPSmith: I don't know, that sounds like nuance to me and I thought there wasn't any. Really though, that's pretty much what I've already been doing. Yet it still causes problems. A lot of it comes from direct translations. Certain people see one and automatically think it's probably wrong. Plenty of companies directly translate their business name. Even in some cases where they don't though, it's still OK to directly translate the name if that's what your left with. So, there needs to be more leeway and less knee jerk reacting. I've created entries for some pretty obscure historical "foreign" companies, that are barely talked about in their native language let alone in English. I'm 100% fine going with a direct English translation in those cases, because this is a knowledge base and IMO it's more important English speakers are able to find the entry. Then not because they don't know some obscure none English company name. That's the point in Wikidata. People are still more then able to find the company by the "foreign" name. So it makes zero difference. Except it improves things for English speakers. The same goes for some modern "foreign" companies. Whatever you want to say about proper nouns etc etc, I'd have never found a lot of interesting entries if the English label was in some other language. Putting it in another language because of Bloomberg or whatever just isn't helpful to "finding knowledge." Which is what we are here for. If an Italian speaker (or anyone) can find Banca Intesa by searching for Banca Intesa, but an English speaker can find it by searching for Intesa Bank or just "bank", then more power to them. They aren't going to search for Banca. I thought that was partly the point in labels. They aren't official titles. There's a statement for that. Like in your Chinese example "Xichang College" is still better then nothing or just the Chinese for an English speaker finding them. Even if it is a literal translation. But you'd probably revert it to the Chinese just because it translates to college instead of university. --Adamant1 (talk) 22:31, 21 April 2020 (UTC)Reply
All rules need to be applied with thought about their purpose, not thoughtlessly. But if we need a detailed breakdown of where primary labels on (proper noun-type) items should come from, here's my priority list:
1. If the person or organization etc. has been written about in the given language, select the version of that name most frequently used by reputable sources (newspapers, databases, etc.)
2. If the entity is not so frequently written about but has an "official" name in that language, use that official name.
3. If neither 1. or 2. apply but there is an "official" name in a close language (same script, similar pronunciation rules) use the name from that language.
4. If none of the above apply, either a literal translation of the name in the more remote language, or a transcription of the pronounced version of the name may be appropriate (literal translation is more appropriate for organizations, probably never appropriate for people).
The description field can be used to add words from a literal translation of the name to aid in searching, in case 3 or at other times the type of organization may not be clear from the name. ArthurPSmith (talk) 17:36, 23 April 2020 (UTC)Reply
@ArthurPSmith: Well, at least your willing to admit it's your priority list. Even if your still wrongly asserting there's a rule your basing it on when there isn't. Personally, I don't think reverting people based on my priorities would be valid. Especially if they aren't based on a rule) but I accept that you do at least. Even if you don't accept my position on it, that a proposal for a guideline isn't a rule and a proposal isn't a valid reason to revert people based on your interpretation of it. Even though I mostly agree with your points, with some small deviations, it can't validly be argued IMO that the proposal is as clear about things as your priorities are. It doesn't even address some of your points. which makes sense because again they are your priorities. I have my own. Which IMO are actually closer to the guidelines then yours are, because I'm not insinuating it says things in a way it doesn't like you are. I don't revert people based on my priorities either. That's the difference, that makes the difference. Do things your way, don't force other people into doing them your way though. I'm not sure what's so difficult about that. Feel free to put the effort into your into an actually guideline. Then maybe I'll 100% follow them. It does zero good to get in edit wars or arguments with other users about it in the meantime though. Hopefully everyone is done with this now and it doesn't come up again unless it actually becomes a rule. --Adamant1 (talk) 09:27, 27 April 2020 (UTC)Reply

Prefer regional language labels or fallback? edit

If a regional label is identical to the standard one, should it still be entered or left for the fallback? For example, should I enter a Canadian English label that’s identical to the English one, or leave it blank and rely on the language fallback unless the spelling differs?

I don’t know if this is related, but when entering values into items, some only show up as a Q number with no label, even though they have a label. My language preference is set to Canadian English. It looks like the language fallback is failing, sometimes, but I can’t discern a pattern. Or maybe this just happens to everyone? Michael Z. 17:27, 10 December 2020 (UTC)Reply

Regional English? edit

In language preferences and labels I see English, British English, and Canadian English. No US English, no Australian English, no Indian English, etcetera. Why is that?

Does “English” represent US English, “international” English, or unspecified English? How do we label specific varieties? Michael Z. 17:33, 10 December 2020 (UTC)Reply

  • https://www.wikidata.org/wiki/Help:Monolingual_text_languages describes the process for adding new language codes. The phabricator task for en-us is at https://phabricator.wikimedia.org/T154589 ChristianKl18:56, 10 December 2020 (UTC)Reply
    Thank you for the “why.” So how should we deal with it?
    Or at least, how does it work? For example, does display of content or labels always fall back to en when en-GB and en-CA are unset? I would infer that we should only enter the regional varieties only when they differ from the umbrella.
    If I encounter an en label that uses British spelling, then should I pragmatically change it to US spelling, since British can be entered separately in en-GB? But then treating umbrella en as en-US is 1) linguistically unsound, as British English is the language’s trunk, and (North) American English a branch, and 2) open to accusations of tone-deaf lack of self-awareness, cultural imperialism (viz. the Phabricator topic), or at least unconscious systemic bias. We can’t not do something, so ignoring this question is self-defeating.
    Wikipedia deals with this by setting regional variety on an article-by-article basis, where en, preferring subject appropriateness or falling back to article creator’s usage (see w:en:MOS:ENGVAR). Shouldn’t we establish our own conventional handling of this as well, as Wikidata’s requirements are different from Wikipedia’s. Michael Z. 19:53, 10 December 2020 (UTC)Reply
    Yes, when en-GB and en-CA aren't set there a fallback to en. When en isn't set it falls back to mul. I think there's agreement that en-GB and en-CA should not be filled when it's the same as en
In cases where there are two groups of people who speak the same language but use a different script I don't think langcom's position that they have to use one Wikipedia and aren't allowed to have one Wikipedia per script has anything to do with Wikipedia's needs either. It's just langcom being a very insular organization that doesn't make good decisions.
It seems that the task for en-IN is also open at https://phabricator.wikimedia.org/T212313 . If you think Wikidata should have those language codes, speak up in the phabracator tasks. If you want en-AU, open the phabricator task for it.
When it comes to what names are used for individual items, it's up to local consensus and we didn't have much problems with those for finding English names. ChristianKl20:24, 10 December 2020 (UTC)Reply
Well, I think we should probably be able to record regional labels consistently without an external body approving each one. For labels, this is primarily US/UK and alternate spellings, and can be handled just as well with just en and aliases as it currently is. Differences in en-CA and en-IN will typically correspond to either US or UK, and sometimes there may be more specific local names, en-AU, en-UK-scotland, en-CA-QC, etc.
This affects lexemes in a big way, obviously, since dictionaries record regional spellings, pronunciations, senses, etc., and I think the solution there is by referencing regions (to virtually construct ad-hoc language subtags) and not relying on two-and-a-half language subtags as the model. But I have barely started to unravel the docs for lexemes.
By the way, what about the reverse: will empty en fall back to existing en-GB, for example, or will it only cascade upwards to mulMichael Z. 21:33, 10 December 2020 (UTC)Reply
As far as I remember fallbacks are organized in a tree and thus en won't fallback to en-GB. Lexeme currently don't have that much documentation and community standards. Standards are still up to be developed. ChristianKl00:22, 11 December 2020 (UTC)Reply
Thanks. I have changed my language pref to just en, and will just add plain en labels in my local spelling convention from now on. This will avoid the most problems, and meets the letter and spirit of the framework in its current state. (If I decide to do the extra work, I would enter the same text in both en and en-CA.) —Michael Z. 16:24, 11 December 2020 (UTC)Reply
ChristianKl, Michael Z (reopening this thread after a year, happy to start a new section if preferred): Phab:T154589 is closed as resolved & done in June 2021. I see en-us when filling out string properties,[1] but not for labels. Do we need a separate Phab ticket for labels?
——
[1] (Just displays as “en-us”, c.f. “British English (en-gb)”, but that's a separate issue.)
. ⁓ Pelagicmessages ) 01:55, 5 January 2022 (UTC)Reply
User:Pelagic, can you clarify? Assuming you mean it’s deployed here, what has changed? When I expand all languages, I still see English, Canadian English, Old English, British English, and Jamaican Creole English for labels, descriptions, and alt labels. Do you mean that “English” is now associated with the code en-US and there is no longer just en? That does not sound good. —Michael Z. 04:20, 11 January 2022 (UTC)Reply
Oops, never mind. I see the label as you describe it in string properties. Thanks for the heads up. —Michael Z. 04:22, 11 January 2022 (UTC)Reply
It’s not good because if I type Eng*, then just en disappears from the autocomplete list, because it lacks the full name “English.” —Michael Z. 04:24, 11 January 2022 (UTC)Reply
I suppose the label/description/alt should be added for consistency, but it will remain a mystery how it’s to be used. I’ve been removing regional English labels and descriptions unless there’s a word with a spelling distinction. —Michael Z. 04:27, 11 January 2022 (UTC)Reply

Capitalization of taxon common names in labels and aliases edit

There seem to be very different capitalization practices in labels and aliases for plant and animal common names in Wikidata items, often depending on the type of taxon and who is recording the label. I have been following the guidance that says "Labels begin with a lowercase letter except for when uppercase is normally required or expected. Essentially, you should pretend that the label is appearing in the middle of a normal sentence, and then follow normal language rules. Most terms would not be capitalized if they appeared in the middle of a sentence, however proper nouns such as the names of specific people, specific places, specific buildings, specific books, etc., should be capitalized." I interpret this to mean not to capitalize a name like "bald eagle" or "brown booby" and to only capitalize proper nouns and adjectives, such as "American robin." And yet there are editors who consistently change labels and aliases to all capitalized words. The argument that I have heard is that these names are regularly capitalized in standard reference sources for these groups. General reference sources, such as Encyclopaedia Britannica and language dictionaries, do not capitalize them. To avoid editing wars, could we please have a policy that tells us what to follow? It would be very helpful to have a standardized capitalization practice here that could be pointed to. UWashPrincipalCataloger (talk) 03:39, 2 January 2022 (UTC)Reply

You are absolutely correct; what "standard reference sources" use as a label has no bearing on how we label here (in English). ArthurPSmith (talk) 18:38, 3 January 2022 (UTC)Reply
Thanks, glad someone agrees. Unfortunately, particularly for birds and plants, there are editors who keep changing the label to capitalized. For example, view the history of Bald Eagle (Q127216) for an example of someone who keeps reverting lowercase for bird names. UWashPrincipalCataloger (talk) 21:55, 3 January 2022 (UTC)Reply

It was my understanding that they should be lower case though I could see why someone would think it is a proper noun and try to capitalize it. I don't know how we can stop people from making these changes though. BrokenSegue (talk) 02:32, 4 January 2022 (UTC)Reply

Well if we could at least provide specific instructions on this in the Help:Label and Help:Aliases guidelines, with some examples, that could be pointed to and hopefully dissuade changes. It would be both educational and a way to report problems. UWashPrincipalCataloger (talk) 02:42, 4 January 2022 (UTC)Reply

There was consensus for this with regards to birds on en wiki back in 2014, which upended the previous status quo: en:Wikipedia talk:Manual of Style/Archive 156#Bird common name decapitalisation. It is not surprising to see the same thing occur here, since several people were not happy about it. —Xezbeth (talk) 06:13, 4 January 2022 (UTC)Reply

"Labels begin with a lowercase letter except for when uppercase is normally required or expected" - uppercase is expected in the official standard English vernacular names of plants and animals (including in the middle of sentences); see the referenced names supplied by IUCN (e.g. Bald Eagle, Dwarf Siberian Pine), MSW (e.g. Brown Bear), Wörterbuch der Säugetiernamen - Dictionary of Mammal Names (Q27310853), IOC, etc. The labels should use exactly the same orthography as these standard international organisations use, and not try to force a different style for them which is not actually used. It is completely wrong (as I have seen this complainant do) to insist on changing to lower case out of misplaced dogmatism when the cited reference itself uses uppercase (both in the title, and in the body of the text). By forcing this dogma, you also risk alienating many of the most knowledgeable editors with experience of the organisms concerned. That is exactly what happened on en:wp when a bunch of contributors with minimal knowledge of birds took it on themselves to enforce decapitalisation (example, from a highly experienced former editor). The wikipedia bird articles have never recovered from this loss; errors, nonsense, and even some vandalism crept in, and was never removed because the more knowledgeable editors had left in disgust. Take a look here too. @Succu: you might wish to comment, too.
@Xezbeth: - there was no consensus there; a guillotine closure of the discussion was made after consensus could not be reached. That is not the same as consensus!
Note that capitalisation of English vernacular names has a long history; it is very far from a modern idea. See e.g. Philip Miller's 1768 Gardeners Dictionary, or Latham's 1781 A General Synopsis of Birds. - MPF (talk) 16:00, 4 January 2022 (UTC)Reply
Thanks, Xezbeth, I came here to point out the same thing; you beat me to it. Though I was going to write “massive stoush” in place of “consensus”. 😉 One argument is that if one writes in running text “I saw a blue tit”, how do you know that blue is part of the name and not a common adjective? If I write that I “sighted a red-tailed black cockatoo” does it mean some kind of cockatoo with black and red plumage, some species of Calyptorhynchus s.s., or am I making a definite assertion that it is surely a Red-Tailed Black Cockatoo (or “Red-tailed Black Cockatoo”?!, Q638242) and not a Glossy Black Cockatoo (Q790668, which is also black with red tail markings)? Taken to the extreme, you end up with constructions like “Lion has been observed to hunt Gnu.” But Wikidata labels are not running text. The other argument is that some majority (but not all) of ornithological name lists and reference works use title case. We have taxon common name (P1843) or even official name (P1448) for that. I would say: preserve case and attach references in P1843; keep the label as lowercase. ⁓ Pelagicmessages ) 16:43, 4 January 2022 (UTC)Reply
(edit conflict ... or rather discussion tools simultaneous post: I started writing my comment before MPF's appeared) ⁓ Pelagicmessages ) 16:48, 4 January 2022 (UTC)Reply
[@Pelagic: I indented your post a bit more to make it clearer] I think using different capitalisation for P1843 and labels is very dodgy ground. I remember one case a year or two ago following the discovery of a new species of bird; the entire global source information all used capital first letters (as per expected usual), and en:wp was the first place where its name was ever given in lowercase, anywhere. That change of the name is surely a breach of the 'no original research' clause? It also leaves open very difficult decisions as to what constitutes a 'proper noun'; this is very far from easy in many cases, as it requires detailed information on the etymology of a name, which can be very obscure, and often counter-intuitive. Or are you suggesting that labels should always be lower case (cooper's hawk, canada goose, etc.)? - MPF (talk) 17:53, 4 January 2022 (UTC)Reply
@MPF: If we store it as Cooper's hawk then it is easy to transform to title case Cooper's Hawk. The reverse isn't true. — E.g. with CSS <span style="text-transform: capitalize;">Cooper's hawk</span>Cooper's hawk. (And ... which code point do we use for the apostrophe? Let's save that one for another time.) ⁓ Pelagicmessages ) 00:37, 5 January 2022 (UTC)Reply
@Pelagic: - why should that be so, and what's the significance of this? - MPF (talk) 01:31, 5 January 2022 (UTC)Reply
None of those (with the possible exception of Britannica) carries the level of international authority that IUCN or IOC do; they're all just second-rate re-users of information, not leaders, not naming authorities - MPF (talk) 20:08, 4 January 2022 (UTC)Reply
I get the impression that caps versus no-caps isn't a case of first- or second-rate, more of an in-group (ornithology and bird enthusiasts) versus out-group thing. Also, it doesn't seem to me that capitalization is as important to those working on, say, mammals or molluscs. (You might find some publications that write “Mulberry Whelk” instead of “mulberry whelk” for Tenguella marginalba Q14468825, but nobody's going to look at the lowercase version and find that it offends their sensibilities. What if you call it white-lipped mulberry whelk, does the desire to add caps get stronger for multiple words?)
Probably the other fields don't have an equivalent of Gill & Donsker. Common names in zoölogy are normally a matter of usage rather than some being Official. I remember when the masked lapwing was known as the spur-winged plover (and notice how that sentence is readable without the caps, though I'd normally use quotes or italic for use-mention distinction). If Masked Lapwing is the official name and spur-winged plover is a now-unoffficial commmon name, does the former get caps and the latter not?
I will note that IOC/G&D says “Official English names of birds are capitalized, as is the current practice in ornithology” [3]. So yes, this is stated explicitly, but the scope is “official English names” and “in ornithology”.
. ⁓ Pelagicmessages ) 00:11, 5 January 2022 (UTC)Reply
Conversely, 'capitals for proper names only' does create 'first-rate' birds (those derived from "proper names") and 'second-rate' "inferior" birds (those not so); yet all birds are equal, so they should all be given equal, consistent capitalisation, and not discriminated between on the basis of etymology. Note that Masked Lapwing (Q839366) Vanellus miles and Spur-winged Lapwing (Q749126) Vanellus spinosus are two different species; I suspect you must be thinking of something else? But yes, capitals generally only for official names, not for unofficial nicknames that are not given in standard lists. I guess in this lies the difference: there are long-standing official lists for birds and mammals, not, or only more recently, for molluscs (though recent lists in UK at least, do use caps for molluscs, insects, etc.). I'm not very familiar with molluscs, but if there is a White-lipped Mulberry Whelk, there are presumably other related species (hypothetical: Red-lipped Mulberry Whelk), and 'mulberry whelk' is just a generic name, rather than a particular species? Like 'rabbit' (cited in the labels guidelines) can refer to multiple species and is correctly lower case, whereas 'European Rabbit' refers specifically to Oryctolagus cuniculus (and only to Oryctolagus cuniculus), and is thus capitalised to indicate it is an official name.
"nobody's going to look at the lowercase version and find that it offends their sensibilities" - what the lowercase version does look like is sloppy, poorly-written, unprofessional. Something you'd expect in a facebook post, not a textbook. Not what one should expect in wikidata. - MPF (talk) 01:31, 5 January 2022 (UTC)Reply
And yet: https://www.sciencedirect.com/science/article/abs/pii/S0048969716304314 and https://www.publish.csiro.au/pc/PC18003 and https://www.publish.csiro.au/wr/wr20129 ("masked lapwing") lowercase throughout. https://link.springer.com/article/10.1007/BF03194271 and https://onlinelibrary.wiley.com/doi/abs/10.1111/j.1365-2907.2009.00140.x and https://www.sciencedirect.com/science/article/abs/pii/S1439179106000880 ("European rabbit") lowercase throughout. So these are poorly-written, unprofession journal articles? UWashPrincipalCataloger (talk) 07:45, 5 January 2022 (UTC)Reply
The senior editor of Ibis (Q1666662) has said so of their house styles, yes. - MPF (talk) 09:29, 5 January 2022 (UTC)Reply
MPF said "they're all just second-rate re-users of information, not leaders, not naming authorities" - as is Wikidata - or perhaps we're third-rate. In any case, the express purpose of labels here is to provide the form that would be used in common ("second rate") texts. It is the name by which this entity is commonly known and would be used in ordinary writing. As Pelagic indicated, it's easy enough to capitalize a name, as is needed generally in English at the start of sentences or in titles. ArthurPSmith (talk) 18:32, 5 January 2022 (UTC)Reply
Another point of interest might be Google ngram data - on Cooper's hawk' for instance, where the lowercase spelling has been more common recently. Also upper-case there would probably be overweighted by use in titles etc. ArthurPSmith (talk) 18:38, 5 January 2022 (UTC)Reply

Suggestion: link to oldids for stability edit

for example, the following is confusing:

Reflect common usage

[...] Examples:

  • The label of common sunflower (Q171497) is the common name, while the scientific name (Helianthus annuus) is featured as an alias.
  • Multiple common names exist for association football (Q2736). The most common name is picked for the label while the other ones are listed as aliases.

The first contradicts itself, and the second is probably the third-most common name (leaving aside for now the discussion above about how that's a judgment call anyway), albeit the most unambiguous and 'proper'. The problem I have though is not with a given example, but that it has the potential to shift out from under the help guide over time, weakening the guidance. Arlo Barnes (talk) 22:34, 25 February 2022 (UTC)Reply

Using Unicode for formatting? edit

Since HTML or wikitext formatting is not allowed in the label, the documentation says: "Instead unicode characters can be used." I wonder what this means. For example, Unicode provides 𝘪𝘵𝘢𝘭𝘪𝘤𝘴 characters but they are meant for mathematics, and should of course not be used for any other purpose. Another example: many labels for French historical characters include the century, which should use a superscript e : Pierre Corneille should be a "dramaturge et poète français du XVIIe siècle", not "... du XVIIe siècle". Unicode provides a phonetic superscript ᵉ, but again, this character would be semantically wrong in this context. So unless I'm wrong, Unicode characters should not be mentioned at all here. Seudo (talk) 00:06, 11 April 2022 (UTC)Reply

+1 from me on this, yes. Unicode abuse is quite annoying! ArthurPSmith (talk) 14:54, 11 April 2022 (UTC)Reply

Drafting of guidelines for new language code mul edit

Based on a long-standing community request there will soon be a new language code on Wikidata for labels, descriptions, and aliases on Test Wikidata: “mul”, a special language code meaning “multiple languages”. It is intended to replace the current duplication of certain labels and aliases in many languages: instead of the given name Douglas (Q463035) having the label “Douglas” in hundreds of Latin-script languages, it should be enough to add it once as the “mul” label and have all other languages falling back to that (before, as usual, falling back to “en” as a last resort). This should reduce the amount of redundant data in Wikidata, and relieve some pressure from the query service.

For more details about the new language code, see phab:T285156.

The technical implementation of the new language code is currently tested on test.wikidata.org, see phab:T297393.

Introducing the new language code on Wikidata will very likely require an adjustment of Wikidata's guidelines and help pages. Editors might e.g. expect an explanation of the purpose of the new language code and a clarification of how it should be used. We created this section as a possible discussion space for editors that would like to contribute to drafting new preliminary guidelines and help pages.

Cheers, Mohammed Sadat (WMDE) (talk) 15:47, 25 April 2022 (UTC)Reply

Some initial thoughts I have on the implementation of this code are as follows:
  • Label strings which occur most frequently among languages should be moved to mul 'with all deliberate speed'.
    • This thus means that all current labels which duplicate the en label on 1) astronomical objects, 2) scientific articles, 3) Unicode characters, and 4) genes/proteins should be removed in favor of using the mul label, and then the en labels on those items should be removed in favor of mul as well.
    • The above list is certainly not complete, but those are in my view the most uncontroversial migrations.
    • The above shift should occur without regard to whether languages use or do not use the Latin script.
    • Among other migrations might be 5) taxa labels that duplicate any P225 values (such values being copied to mul), 6) the most frequent Latin-script transcriptions of place names, and 7) the most frequent Latin-script transcriptions of names of humans.
  • If those users and bot accounts who are currently adding duplicate labels for these items do not get the hint, they should be blocked until such migrations are complete.
  • Any attempt to add a label which duplicates the mul label or a mul alias, or to add an alias which performs similar duplication, should be rejected on the server side without exception, perhaps suggesting that the mul label be adjusted if affecting the labels of multiple languages is desired.
    • Others who may not get the hint and change all language labels except the mul label should be summarily blocked (for a shorter period of time than the block mentioned above).
  • It should be emphasized to users, in light of the above, that a language may have a label separate from the mul one if it has a reason to be meaningfully distinct from the most common label across other languages.
    • In particular, it should be clearly noted that the mul code exists to help avoid exact duplication of data, rather than to 'stifle support of certain languages' or anything else similarly misguided.
Mahir256 (talk) 16:10, 25 April 2022 (UTC)Reply
  • Is English intended to be special somehow here? I think it would be fine to have English also fall back to "mul" if a separate English label is not provided, and the above restrictions suggested by Mahir256 should apply to the English label also. (By the way, I understand 'mul' for labels and aliases, but is it really an issue for descriptions also? That seems a bit unlikely to me, that descriptions would also be cross-language somehow). ArthurPSmith (talk) 18:58, 25 April 2022 (UTC)Reply
  • @ArthurPSmith: I was in favor of, and continue to be in favor of, having en and other languages fall back to mul, rather than mul falling back to en as the code on testwikidata is currently implemented. As for the four classes listed in that one sub-bullet, I've clarified that the en labels should be removed as well in favor of mul. I do not foresee mul being particularly useful for descriptions, although you may wish to opine at phab:T303677 regarding a related matter. Mahir256 (talk) 19:02, 25 April 2022 (UTC)Reply
  • Just to avoid misunderstandings: The current implementation follows the Translatewiki fallback chain before it falls back to mul. So English falls back to mul like any other language and is in no way special up to that point. Only if there is neither a Translatewiki fallback nor a mul fallback we will fall back to en as a last resort (this makes sense for practical reasons until English has lost it's prime position in Wikidata's Labels). --Manuel (WMDE) (talk) 14:46, 26 April 2022 (UTC)Reply
  • I don't believe that gene/protein names are generally language independent. If you take a well studied gene like apolipoproteins E (Q424728) it's called 'apolipoproteins E' in English, ' Apolipoprotein E' in German, 'Apolipoprotéine E' in French, 'Apolipoproteiini E' in Finnish and a variety of different names in other languages. While the names are similar, they differ and thus mul is not a good solution for genes and proteins.
When it comes to humans and place names I would suggest to move their names to mul for all items were currently all labels are the same. ChristianKl20:31, 7 May 2022 (UTC)Reply
No, and neither are astronomical objects or scientific articles. Unicode characters are the only ones which are applicable here. Even if the intent is not to stifle languages, the effect of removing "duplicates" would be exactly that as it would remove distinctions made in the database that aren't explicated anywhere else. We know that "j" and "w" definitely do not make the same sound from one language to the next, and if we say that "j" can be in the same place for mul *and it's duplicates* that's information in itself. That's saying that the pronunciation is not important to understanding the string, that there are legitimate correct alternate pronunciations in other languages which happen to be spelled the same, or that a given language permits a pronunciation other than the one implied by the representation. It may be "common knowledge" that neighborhood is en-US and neighbourhood is en-GB, but to say noodle is en-US and noodle is en-GB is not redundant at all. Otherwise, how would one use data to demonstrate the null hypothesis that noodle is invariable in spelling across dialects? Middle river exports (talk) 19:53, 16 August 2022 (UTC)Reply

“Descriptive” proper nouns edit

I am looking for some guidance on how to label entities such as government bodies (e.g. the Danish Ministry of Finance), positions (the Danish Minister for Finance), legal acts (the Danish Copyright Act), organisations (Danish Osteoporosis Soci­ety) etc. in their non-native language. Some of these have an official/commonly used name in English, others have not.

If a Danish institution has an official English name, it often includes the nationality (“Danish”) even if the native native name does not (e.g. English name: “Danish Osteoporosis Soci­ety” vs. Danish name: “Osteoporoseforeningen” (literally: “the osteoporosis soci­ety”)). The reasoning is that the English name will often be used in an international context, perhaps together with other similar national institutions (the French Osteoporosis Soci­ety etc.).

Even without an official name, such “descriptive proper nouns” are easy to translate to any language. So my question is: Is it preferable to include the nationality in English labels for entities with such “descriptive names” to prevent ambiguity with similar entities from other countries (the Danish Ministry of Finance vs the German Ministry of Finance), or should the English label (if there is no official name) just be a literal translation of the native name without pre-/appending the nationality, so the nationality is only mentioned in the description? --C960657 (talk) 06:01, 1 September 2022 (UTC)Reply

I think if there is no official translation then a more descriptive label is probably helpful. However, name ambiguity in Wikidata is generally supposed to be resolved in the description field, not the label, so I don't think this is really a critical issue. Also I'm not sure a literal translation is actually correct in many cases - for some institutions with no official English name the native-language name is fine (Université de Montréal for example). If it's possible to determine I think the best English label would be whatever is most often used for the item in English language publications. ArthurPSmith (talk) 15:53, 1 September 2022 (UTC)Reply

Disambiguation information belongs in the description -- rationale edit

This is stated without rationale so I will provide one: one of the functions of Wikidata is to serve machine interpretation of text. The labels, both the main label and the other ones, must be exactly what the machine sees in the text (and what a human sees in the text). In thesauri terminology, the main label corresponds to "preferred term" and the other labels correspond to "non-preferred term". Each term needs to be attested in use. Adding brackets to the labels to disambiguate would be useful for some purposes, but not for the stated purpose. The description field does not have this limitation.

Thus, the set of labels for a concept/entity is a set of actually used synonyms referring to the concept or entity. Adding brackets would create a label that is either never or very rarely used.

--Dan Polansky (talk) 16:23, 22 December 2022 (UTC)Reply

Translation edit

If some item has no "official" (used in authorities) name in some language, is it allowed to just translate a label from another language? Infovarius (talk) 22:17, 27 December 2022 (UTC)Reply

Return to "Label" page.