MediaWiki talk:Gadget-SimpleTransliterate.js

Suggestion edit

Latest comment: 10 years ago2 comments2 people in discussion

It should work with diffs, too. Great tool! --Ricordi samoa 23:37, 2 November 2013 (UTC)Reply

Done –ebraminio^talk 10:03, 3 November 2013 (UTC)Reply

More languages edit

Latest comment: 9 years ago2 comments1 person in discussion

~~Supporting more languages should be requested here then I can easily import them to the gadget. You can either vote or comment and suggest new language on that bug.~~ –ebraminio^talk 10:04, 3 November 2013 (UTC) (edited on 22:14, 5 September 2014 (UTC))Reply

You can propose any improvement or additional script support here. That would be very helpful if you can provide a simple one-to-one dictionary of single characters to their English transliteration like the list we have here. –ebraminio^talk 22:14, 5 September 2014 (UTC)Reply

Japanese katakana "ya" missing edit

Latest comment: 9 years ago2 comments2 people in discussion

In this script, the Japanese katakana "ya" (ヤ) is missing. There is in the script the small katakana "ya" (ャ), but not the bigger one. --Zerabat (talk) 13:50, 14 May 2014 (UTC)Reply

@Zerabat: Thank you! –ebraminio^talk 14:43, 3 June 2014 (UTC)Reply

In Japanese entities - pinyin edit

Latest comment: 3 years ago7 comments4 people in discussion

Hi, as instrument maybe cannot distinguis between Chinese and Japanese, could it omit/deny transliterations in Japanese at all? It means: if there is link to Chinese wikiproject (e.g. zh.wikipedia) transliterate to pinyin, but if is link to Japanese wikiproject (e.g. ja.wikipedia), leave no transliteration (neither japanese nor pinyin). By (Japanese) title 此花区 [konohanaku] shows cihuaqu, what is 1. totally different 2. lie at all. --Kusurija (talk) 06:30, 4 June 2020 (UTC)Reply

So what?? Any idea?? --Kusurija (talk) 12:26, 17 June 2020 (UTC)Reply

@Ebrahim: An additional idea would be to detect hiragana and katakana to provide a Japanese reading. NMaia (talk) 06:15, 27 August 2020 (UTC)Reply

Denying/omitting transliterations in Japanese at all is definitely not a constructive solution as it has proved to me personally it is useful for Japanese as well in my personal experience for Japanese titles, this is just a personal gadget designed for people wanted to resolve cross wiki conflicts (at the time such things were common), it isn't prefect for every script so considering that we should lower our expectations from it, however code contribution is welcomed as for example I don't know to detect hiragana and katakana at all so someone familiar should implement the feature. Thanks −ebrahim^talk 21:39, 6 September 2020 (UTC)Reply

@Opencooper: Would you perhaps know how to help in this case? NMaia (talk) 03:33, 8 February 2021 (UTC)Reply

@Ebrahim, NMaia: If I'm understanding correctly, this userscript maps non-English characters one-by-one to a chosen transliteration. And the problem seems to be that currently it's transliterating Japanese text as Chinese? Fixing this would require having a separate mapping dictionary/logic for Japanese.

Detecting the language is actually really simple since the interface tags each site link with a lang attribute, so you can just do var lang = $(link).attr("lang"); and check if the code is "zh" or "ja" to use the appropriate mapping.

As for accurate transliteration, that's another beast entirely… I don't think the one-to-one approach can work with Japanese because characters can have multiple possible pronunciations. I quickly looked up how some libraries do it, and they rely on tools like MeCab, but we only have basic JavaScript at our disposal. The Japanese Wikipedia actually has the pronunciation for most articles and this could be parsed. I've written a script that fetches these that can be used for guidance if it's decided that's the approach to take. For now at least, the Chinese mappings could be separated and only used if the lang code is "zh". (please ping me for any mentions) Opencooper (talk) 05:06, 8 February 2021 (UTC)Reply

Opencooper: If it doesn't make it much more complicated (just incorporating lang="" to improve the situation a bit at least), sure why not :) I know it isn't perfect (which isn't for any script I can say) but is better than nothing I think we can agree. Thanks :) −ebrahim^talk 19:15, 8 February 2021 (UTC)Reply

Cyrillic romanization choices edit

Latest comment: 3 years ago2 comments2 people in discussion

The Cyrillic letters are currently an odd mix of European-style and Anglo-American style romanizations. For example:

Є є = Je je
Ї ї = Ji ji
Й й = J j
Ж ж = Zh zh
Х х = Kh kh
Ц ц = C c
Ч ч = Ch ch
Ш ш = Sh sh
Щ щ = Shch shch
Ю ю = Yu yu
Я я = Ya ya

They should all be either European-style: je, ji/ï, j, ž, ch/x, c, č, š, šč, ju, ja; or all Anglo-American: ie/ye, i/ï/yi, i/ĭ/y, zh, kh, ts, ch, sh, shch, iu/yu, ia/ya. And they should conform to an actual system.

Is it possible to have separate tables for separate languages? If not, then possibly the only reasonable choice for a single system is ISO 9 (Q913336), a European system based on Russian Cyrillic and extended to virtually all languages.

But that would be a compromise. It would be better to have different tables for different languages, since they have their own phonology, and each has its own most commonly used system or systems. Ideally, we could use a consistent set of systems, like the ALA-LC romanization (Q603208) romanization tables, found online, and widely used in libraries and publishing. Or a more accessible simplified version without the special characters. Alternately, we could follow English-language Wikipedia’s usage.

I’d be glad to provide references and edit the code, if we can agree on the requirements. —Michael Z. 23:11, 1 January 2021 (UTC)Reply

Michael: If it doesn't make the much more complicated, sure why not! :) −ebrahim^talk 08:40, 8 February 2021 (UTC)Reply

Change request: Cyrillic characters edit

Latest comment: 3 years ago8 comments3 people in discussion

{{Edit request}}

Please make the following changes:

Add comments to mark the start and end of Cyrillic-script characters: from lines 356 (Ѐ) to 585 (ӹ).

For consistency with most of the transliterations, change Croatian-style j- to y-, and c to ts, etc. Distinguish Ukrainian g and i. Use the conventional transliteration of the Belarusian semivowel. Specifically:

381    "Й": "Y",
413    "й": "y",
394    "Ц": "Ts",
426    "ц": "ts",

// Ukrainian
483    "Ґ": "Ġ",
484    "ґ": "ġ",
360    "Є": "Ye",
440    "є": "ye",
362    "І": "Ī",
442    "і": "ī",
363    "Ї": "Ï",
443    "ї": "ï",

// Belarusian
370    "Ў": "Ŭ",
450    "ў": "ŭ",

Overall the transliteration remains imperfect, because letters should have different values in different languages, but at least it will be consistent within words in more of the languages. (the Serbo-Croatian languages have their own rules, and someone familiar with them should review the relevant letters.) Thanks. —Michael Z. 16:08, 19 January 2021 (UTC)Reply

Michael: Feel free to put your tested modified version somewhere so I can just put it, thanks :) −ebrahim^talk 08:43, 8 February 2021 (UTC)Reply

Thanks @Ebrahim:. I have copied this to User:Mzajac/Gadget-SimpleTransliterate.js and updated it. How to test it is not obvious, since this is invoked by a checkmark in Preferences > Gadgets, but I’ll try to figure it out. —Michael Z. 16:55, 8 February 2021 (UTC)Reply

Okay, got it in my common.js and left the gadget on to compare. Seems to work. Will browse a lot of entries and see if anything is broken. —Michael Z. 17:01, 8 February 2021 (UTC)Reply

Done Thanks! :) −ebrahim^talk 19:13, 8 February 2021 (UTC)Reply

Thank you very much. That is an improvement for several languages. If someone implements language-based exceptions as mentioned above, I could take a crack at a proper Ukrainian romanization filter. —Michael Z. 19:53, 8 February 2021 (UTC)Reply

@Mzajac: I added a language parameter to the main function that performs the transliterations (although this parameter is not presently used in the function itself), so if you wanted to mess around with language exceptions you can start experimenting with them. Mahir256 (talk) 21:00, 6 March 2021 (UTC)Reply

You are awesome! I will have a peek soon. —Michael Z. 22:39, 6 March 2021 (UTC)Reply

Buginese script edit

Latest comment: 3 years ago3 comments2 people in discussion

Hello, I notice Buginese script is not included here, so bug:Buginese Wikipedia articles are not transliterated. Can I add them to this list? Bennylin (talk) 18:58, 6 March 2021 (UTC)Reply

@Bennylin:

Done Please check that the resulting transliterations are appropriate. Mahir256 (talk) 20:35, 6 March 2021 (UTC)Reply

It's not perfect, but I suppose it will do for now within the constraints. Bennylin (talk) 17:56, 7 March 2021 (UTC)Reply

Ukrainian romanization edit

Latest comment: 2 years ago6 comments2 people in discussion

Hi, @Mahir256:. I have built a beta version of a Ukrainian transliterator at User:Mzajac/Gadget-SimpleTransliterate.js. It’s feature-complete and seems to work, but I would like to test it on a wide set of text before putting it into production. I also used the nullish coalescing operator (??),[1] and should probably rewrite that for broader compatibility.[2]

In the meantime, if you have a moment, would you have a look and critique the code, particularly the way it hooks into the script? It lives outside the array loop, because it has to perform operations on letter sequences and word-initials. I suppose I could program this to run on the fully split string, but that code might be more obtuse.

Thanks. —Michael Z. 20:33, 8 March 2021 (UTC)Reply

Is anyone able to have a look at this? I have been using it for two months, and it works fine in my browser. I think it is ready for distribution. —Michael Z. 20:39, 5 May 2021 (UTC)Reply

@Mzajac: If you can somehow adjust your code such that for characters not in the Ukrainian romanization set you provided, the character mappings from the original algorithm can at least be fallen back on, I'd be happy to integrate your transliterator into the main script. Mahir256 (talk) 20:54, 5 May 2021 (UTC)Reply

@ Mahir256: Do you mean it should apply the default transliteration to all non-Ukrainian-alphabet characters in a field that has language code uk? —Michael Z. 22:30, 5 May 2021 (UTC)Reply

@Mzajac: Or at least to all characters not otherwise handled in the mappings "dict_uk_Latn" or "dict_uk_Latn_init" which you defined. Mahir256 (talk) 22:48, 5 May 2021 (UTC)Reply

@ Mahir256: I will have a look at that, probably in several days. I can appreciate the rationale, but could be some weird results, since 1) many alphabets overlap in character repertoire with Ukrainian, and 2) the default scheme for Cyrillic uses rather different assumptions.

Can you point out some examples where it would be an improvement? —Michael Z. 01:34, 6 May 2021 (UTC)Reply

Add topic