Part of Nobumi Iyanaga's website. n-iyanag@ppp.bekkoame.ne.jp. 12/29/02.

logo picture

From Unicode to Styled Text conversion

My Unicode and MacOS, and Code converters page, which has been uploaded the first time on September 26th 2000, is now very dated, and should be updated entirely. As I don't have time, for now, to do so, this page only provides with a little new tip with which you can easily convert Unicode text files into legacy encoding files, with some special diacritical font(s) for transliteration of East-Asian text.

As I already wrote in my Unicode and MacOS, and Code converters page, it is easy to convert from Unicode text (or text files) into multilingual styled text (files), using TEC OSAX; but if the original text contains special diacritical characters, like vowels with macron, some consonants with dot below, etc., they will be rendered as "?" in the converted text. Even if we use the Perl script Uni2Multi created by Nowral-san, these characters will be converted into their hexadecimal representations.

But it is easy, in fact, to customize this script, if we know how this script works. This script comes with a folder named "APPLE", in which are gathered all the conversion tables between Apple's legacy encodings (which use Apple's Language Kits) and Unicode. At the bottom of the script, there are some data lines like the following:

ROMAN.TXT 2
JAPANESE.TXT 16384
CHINTRAD.TXT 16896
CHINSIMP.TXT 28672
SYMBOL.TXT Symbol
DINGBATS.TXT Dingbats
KOREAN.TXT 17408
ARABIC.TXT 17920
CENTEURO.TXT
CORPCHAR.TXT
CROATIAN.TXT

This is a tab and return delimited data: each line corresponds to a script (MacRoman, Japanese, etc.); the first field is the name of the file containing the conversion table; the Perl script reads these files; and the second field is either the fontID number or the font name of the default font used for the corresponding script (some of the scripts lack this information -- in these cases, the conversion table files will not be read).

So, what is needed is to make a conversion table of the diacritical font that you use, and place that file in the folder named "APPLE"; and you will add the name of that file, with the name of that font, in these data lines. Suppose that you usually use the font named "Appeal": you will make a conversion table between the glyphs of that font and Unicode; you will name it "Appeal.TXT", and save it in "APPLE" folder; then you will add this single line after "ROMAN.TXT    2":

Appeal.TXT    Appeal

Now, how to create these conversion table? This is easy also, because we already have tables of correspondences between different diacritical fonts and Unicode (see my another page East Asian Diacritical Fonts and Unicode). I wrote a Nisus macro which generates the needed conversion tables from these tables of correspondence in Nisus format (that you can download from http://www.bekkoame.ne.jp/~n-iyanag/researchTools/diacritical_fonts_download/nisus_tables.sit.hqx). These conversion tables, for the fonts "Appeal", "ITimesSkRom", "Hobogirin", "MyTimes", "Norman", "Normyn" and "Times_Norman", have this format:

hex of diacritical font    hex of Unicode eqivalent    #glyph's description

Any lines beginning with "#" will be ignored by the Perl script. The tables generated automatically with this macro are perhaps not perfect, but they can certainly be used. You can download from here (40 KB) these tables along with the Nisus macro named "diacriticalFontTable_macro".

I hope this tip, and the conversion tables and the macro, will be useful for some users.

Please write me if you have any suggestion, feedback or bug report.
Have fun, and thank you!


Mail to Nobumi Iyanaga


frontierlogo picture

This page was last built with Frontier on a Macintosh on Sun, Dec 29, 2002 at 11:25:49 PM. Thanks for checking it out! Nobumi Iyanaga