Part of Nobumi Iyanaga's website. n-iyanag@ppp.bekkoame.ne.jp. 1/31/07.

logo picture

Conversion of Word 5.x files containing Chinese text

version 0.7.7
uploaded 07-01-31

Problem, and solution

As Chinese Mac's Applications page points out, MS Word 2004 can't handle Chinese text in documents from Word 5.x for Macintosh. Although it is said in the page on Word 5 on OS 9 that SimpleText can serve as an intermediary, I was unable to work with either of these applications as described there (I use Word 2004 Japanese version). Anyway, the work-around proposed there is cumbesome, and it would be not easy to preserve the style attributes in the process.

However, looking at the rtf code of the Word 5.0/5.1 files containing Chinese text, I could find a way to convert them into a more "modern" rtf format, which can be opened with (I think...) any versions of Word above 5.x, and other rtf savvy editors or word-processors for OS X, such as TextEdit, Jedit X or Nisus Writer Express; and I could write an AppleScript droplet which will automate the conversion process, except two points (see below).

New in version 0.7.7:
I fixed some bugs, and I re-wrote the script so that it should work on OS 10.3x.
-- This version is not yet well tested, If you find any problems, please report them to me.

Thank you in advance!

Some conditions are necesary:

  1. I think the text must be mainly in English (or other Roman script langueges), with Chinese text inside it. -- I am not sure if a file with all the text in Chinese can be converted with my script.
  2. I think the OS version must be 10.2 or above. OS 10.4x is recommended (this is required to make the converted files open automatically with MS Word when double-clicked).
  3. Chinese text in original files must be written using a TrueType font such as Apple LiSung Light (you cannot use fonts such as Taipei Mono or *Song*).

  4. I wrote for the first version I uploaded the following condition:
    If the file contains special diacritical characters which require non-standard encoded fonts, they will not be converted to Unicode (this can be done rather easily, if there is any demand...)

    -- New: Well, there was a demand for the conversion of files written in Appleal font into Unicode. I could implement this... (see below).

The conversion process is easy:

Here are the steps for files written with ordinary fonts (Times, Palatino, etc.)

  1. Convert your original Word 5, or 5.1 files into rtf format using Word 5 or 5.1.
  2. Drop these rtf files on the AppleScript droplet called "convertWord5Chinese.app"; this will generate converted rtf files in the same folder as the original files. The file names will be:
    xxx_conv.rtf
    If there is already a file of the same name, it will be, for example
    xxx_conv1.rtf, etc.
  3. Open the converted rtf files with Word 2004 (or other OS X versions of Word). If you use OS 10.4 or above, you can simply double-click on the converted files to open them with the latest installed Word (sometimes, Finder may display TextEdit's rtf file icon, but it will change automatically to Word's icon sometime later...). Save the converted files in Word's native format (.doc format).

So, the process is in three steps: 1. convert your file into rtf format using Word 5.0/5.1 (manual step); 2. drop the rtf file onto the AppleScript droplet (automated step); 3. open the converted file using Word 2004, and save it into .doc format (manual step).

New:
If your original files are written using Appeal (for special diacritical characters -- for this issue, please have a look at my another page entitled East Asian Diacritical Fonts and Unicode, from where you can download Appeal font as well...), you will rather use another droplet named " convertWord5Chinese_appeal.app".

The font in the original files may be specified as Appeal, but you can use another, more ordinary font as well (such as Times). In that case, of course, the special diacritical characters will not be displayed correctly, but this does not matter in the resulting files.

The steps are basically the same, with a few differences.

  1. Convert your original Word 5, or 5.1 files into rtf format using Word 5 or 5.1.
  2. Drop these rtf files on the AppleScript droplet called "convertWord5Chinese_appeal.app"; a dialog will appear, for each file, asking you: "Do you want to do a <Appeal to Unicode> conversion?" with three buttons "Cancel", "Yes", "No". The default button is "No", but if your files use Appeal's diacritical characters, click on the button "Yes". This will generate converted rtf files in the same folder as the original files. The file names will be:
    xxx_conv.rtf
    If there is already a file of the same name, it will be, for example
    xxx_conv1.rtf, etc.
  3. Open the converted rtf files with Word 2004 (or other OS X versions of Word). If you use OS 10.4 or above, you can simply double-click on the converted files to open them with the latest installed Word (sometimes, Finder may display TextEdit's rtf file icon, but it will change automatically to Word's icon sometime later...).
    Note on the English font:
    If your original files use Appeal for English font, it will be changed automatically to the Unicode font named Gandhari Unicode (it you don't have this font, I think Times New Roman, the default font for Word, will be used instead [but I am not really sure...]); otherwise, some other font (e.g. Times New Roman?) will be used. The characters which cannot be displayed in that font will be displayed in some other font (on my computer, the font Sanskrit 2003 is used...). At this point, you can do a "Select All" and change the English font of all the file to a font of your choice. One thing you should have in mind is that you should use a font having all the four type faces, i.e. not only Roman (or Regular) and Italic, but also Bold and Bold Italic.

    Now, save the converted files in Word's native format (.doc format).

As you see, this is perhaps a little more complicated than the conversion of files using ordinary fonts, but the basic steps are the same.

If there are demands for other special fonts (listed on the page East Asian Diacritical Fonts and Unicode), I can write a script easily. Please write me, specifying the font you would like to be supported.

I am not sure if I could make my script deal correctly with every possible case. But the script is certainly better than the first version I uploaded (on January 19, 2006). If you find any oddities, please write me, describing the problem, and sending me the rtf file that you have dropped on my droplet.

I *think* that Word 5.0/5.1 have the same problem with Japanese (or other "double-byte languages"), but I am not sure...

Please note that I don't own Word 5.0/5.1, so I cannot do extensive testings.


Download

Please download my scripts from this link (132K to download).

The expanded folder will contain:

I joined the Perl script only to show the conversion script for those who would be interested to see how it works. As you can read in that script, I borrowed many lines from a script that Kino wrote, named ImportClarisRTF.pl. I would like to express my gratitude to him.

And I have been greatly helped by Philip Clart and Morten Schlütter for testings.

I hope this script will be useful for your work!

I would appreciate any feedback, bug report or suggestions. Thank you in advance.

Have fun!


Go to Research tools Home Page
Go to NI Home Page


Mail to Nobumi Iyanaga


frontierlogo picture

This page was last built with Frontier on a Macintosh on Wed, Jan 31, 2007 at 10:19:12 PM. Thanks for checking it out! Nobumi Iyanaga