Part of Nobumi Iyanaga's website. n-iyanag@ppp.bekkoame.or.jp. 9/18/97.

logo picture

Multilingual Web Authoring Kit for Frontier

user.html.language:
*ReadMe*

[I am preparing a Japanese version of this ReadMe. Please wait some days...!]

To download "Multilingual Web Authoring Kit for Frontier", please click here (78 KB).

For the users who use web browsers that can read Japanese, please set the character set to Latin 1 or MacRoman to read this page.

Introduction:
This is an alpha version of a new set of scripts allowing the users to create web pages in Japanese and probably in any other double byte coded languages in the Frontier environment.

These scripts use the MWU ("Multilingual/Multbyte Web-authoring Utilities") suite created by Hideaki Iimori <iimori@libri.bekkoame.or.jp>, that you can download from his web site (I included in this package the latest MWU suite for the convinience of the users):

http://www.bekkoame.or.jp/~iimori/sw/MWUsuites.html

I have been writing a script set called "Japanese Web-Authoring Kit for Frontier". Now, the MWU suite is so much improved (with the version 1.1b2), that we have almost nothing to do to make web pages in any language supported by the Mac OS -- all the essential parts of the job are done by this suite; and we only have to call the due scripts in the MWU suite at the due places. So, I decided to rewrite entirely this script set, call it "Multlingual Web-authoring Kit for Frontier", and place it at a higher place, user.html.language.

Iimori-san added another UCMD in the MWU suite version 1.1b3, which is "preEvaluate ()".

I also include people.[user.initials].print () script which is used in user.html.language.test () script (new in alpha 5 version).

For those who have used my earlier "Japanese Web-Authoring Kit for Frontier": user.html.macros.language is no longer necessary. If you want, you can delete it simply. But you can keep it as well: I think that it will not disturb this new kit to work. But you have to paste the patched scripts anew.

The MWU suite contains now five important UCMDs, which are all "WorldScript savvy":

One important concept in these UCMDs is the parameter called "scriptCode": it is a number representing the script code of each language supported by the Mac OS: e.g. 0 for the Roman (U.S.) System, 1 for the Japanese System, 2 for the Traditional Chinese System, etc. One of these number is passed to the MWU scripts as their last required parameter. -1 represents the script code of the current System.

is a "WorldScript savvy" version of string.processHtmlMacros (flaps, activeURLs, clayCompatibility, osacallback). The parameter "embeddedcode" is the equivalent of "osacallback" in string.processHtmlMacros () and must be the address of an OSA compiled object. MWU.NProcessMacros () can now evaluate the macros having one or more string literals parameters in double byte code characters (e.g. somemacro picture), supplying automatically a supplementary "\" if any of the string literals in two byte code characters contains a "\" at the second byte.

is a "WorldScript savvy" version of string.countFields (s, chdelim).

is a "WorldScript savvy" version of string.nthField (s, chdelim, n).

adds a supplementary "\" if the string s is in double byte code contains a "\".

This script encodes four characters, "\", "{", "}", "@" and "«", that cause problems in the process of html rendering in Frontier html suite (actually, you can specify any character to encode). A Japanese word like "yoyaku picture" ("ó\ñÒ" in one byte string) is encoded in this format: "{char(0x97)+char(0x5c)}yaku picture". As you can see, the first character, containing the problemetic character "\", is encoded in the Frontier's html macro style, so that when it is passed to the script html.processMacros (), the original characters are restored. The parameter "escapeChars" is an optional parameter, default being the four problematic characters, but it can be any character. The parameter "excludeQuoted" is also an optional parameter, true or false, default being true. When this parameter is set to true, the script will ignore all strings enclosed between double-quotes. This is very useful, because the double-quoted glossary entries must not be encoded to be refered to the glossary tables.


Iimori-san is also the author of some other important UCMDs, such as jcode suite:

or OSAXen such as TextInfo OSAX:

Transliterate OSAX:

WrapText OSAX:

which are all "WorldScript savvy".


This package needs also the regex suite. It's latest version can be downloaded from:

For the background of the problems related to the web -authoring in double-byte languages in Frontier, please refer to the web site started by Philip Suh (on Aug 6, 1997), at:

This script set supports for now only the authoring of Japanese web pages. But I think that any other two byte encoded languages (Simplified Chinese, Traditional Chinese and Korean) can be supported almost with the same scripts.


What are the advantages of the "Language Framework"?


How to use...:

First, paste the patched scripts, that you will find in user.html.language.patchedScripts to their respective places:

The patched script calls first user.html.language.getLanguageDirective () before any processing, which seeks in the rendered object, its parent tables and in the user.html.prefs table if the "language" directive is defined, and if it is, it enters the value in html.data.page.language. Then, if the language directive is defined, it calls user.html.language.[language name].pageFilter () just before the call to pageFilter (), and user.html.language.[language name].finalFilter (), just before the call to finalFilter (). Finally, it deletes html.data.page.language and puts its value in user.html.language.lastLanguage (Japanese is the only language supported for now...)

The patched script calls the script string.nthField (s, "«", 1), which will in its turn calls MWU.nthField (s, "«", 1, scriptCode).

The patched script calls MWU.countFields (), with the appropriate script code, which is given by the script user.html.language.scriptCode (). This script is not used from our scripts, but is certainly useful for the multilingual web-authoring in Frontier.

The patched script returns the "string.lower'ed" form of the directive value if it finds it in the html.data.page table; on the other hand, it allows the form "on/off" to mean "true/false". It adds also two "default" prefs, which are:

"charset" << added Wed, Jan 1, 1997 at 11:48:47 PM by NI
   return ("iso-8859-1")
"includeMetaCharset" << added Wed, Sep 3, 1997 at 11:38:28 by NI
   return (false)

With this patch, string.nthField (s, chdelim, n) calls MWU.nthField (s, chdelim, n, scriptCode), the scriptCode parameter being passed by user.html.language.scriptCode ().

The patched script sees if the language directive is defined; if it is, it passes the string parameter to MWU.NProcessMacros (), with the due scriptCode parameter. If the scriptCode is "0" (i.e. roman), and if the isoFilter pref is set to true, it returns the "isoFilter'ed" string; otherwise, if the language directive is defined, or if it is not "roman", the isoFilter is omitted.

The patched script is changed so that the users can use macros inside the glossary entries. The macros must be written as usual in the format "{xxx}" [without quotes!]. But the escaping character, "\", before "{" is not supported. This script is necessary if you render a script object having both string literals in Japanese, and glossary entries in Japanese. And anyway, it can be useful.

If the rendered object it a script object, the patched script sees if the language directive is defined, and if it is, it passes the object's address to the script user.html.language.[language name].scriptPageFilter (adr).

The patched script enters in the html.data.page.title the title directive as a string literal, without evaluating it, at the condition that the title directive is given as a string literal (enclosed between double-quotes). If a title in Japanese contains some characters having "\" at the second byte, the backslash is dropped out when passed through "evaluate ()". This patched script avoids this problem.

or (preferably?) [new in the alpha 5 version]

The patched script calls MWU.preEvaluate () just before it evaluates the directive. If a directive in Japanese contains some characters having "\" at the second byte, the backslash is dropped out when passed through "evaluate ()". MWU.preEvaluate () adds a supplementary "\\" in such cases. This script is more general than the earlier one (user.html.language.patchedScripts.runDirective), and from the point of view of the speed, it seems that this script is more or less the same as the other. -- Please see the version history, at the bottom of this page.

Of course, I strongly recommend to keep the original scripts in the html suite, toys.commentDelete (), string.countField () and string.nthField (). You can keep them for example as suites.html.["buildObject backup"], etc.


Now, to create web pages in Japanese, you will have to define a new directive in each of your web site table, or web page, or in the user.html.prefs table:

#Language "Japanese"

That's all.

Now, you should be able to write Japanese text in your page text, in your template, glossary entries, titles, and use macros that insert Japanese text in your page, just as if you were creating English or any other roman language web pages in Frontier. You will be able to write your own "pageFilter ()", "finalFilter ()" or scripts in "tools" table.

Important notes:
[A]

It is recommended in general to not use string literals in Japanese in macros or script objects. The best way is to put them in some wpText objects, and call them in this way:

string (wpTextObject containing string literals)
-- e.g. it is better to write a macro like this:
{popText ("yoyaku picture", "Hello!", 20)}

in this way:

{popText (string (myWord), "Hello!", 20)}

where "myWord " is the name of a wpText object in which you would have written the word "yoyaku picture".

This problem is briefly explained in Iimori-san's Frontier related web pages, especially at:

<http://www.bekkoame.or.jp/~iimori/progtips/frontierj2.html>

In fact, the string literals that raise problems are only those which contain "" at the second byte. Here is the list of these characters:

charlist1 picture

("translated" into one byte, these characters are:

Å\, É\, Ñ\, Ü\, â\, ä\, ã\, å\, ç\, é\, è\, ê\, ë\, í\, ì\, î\, ï\, ñ\, ó\, ò\, ô\, ö\, õ\, ú\, ù\, û\, ü\, [dbl dagger]\, ·\, [base ']\, [base "]\, [per thou]\, Â\, Ê\, Á\, Ë\, È\, Í\, Î\ )

MWU.NProcessMacros () (from its version 1.1b1 onward) supplies automatically a supplementary "\" to these characters, so that the problem is solved for the macros. Nevertheless, it is a good habit, IMHO, to write macros in this way:

{popText (string (myWord), "Hello!", 20)}

On the other hand, if you type or paste simply these characters in string cells, the second byte "\" are automatically dropped when the cell is updated by hitting the enter key or closing the table window, so that they become corrupt. To see that, you only have to change the font of the table to some Japanese font (e.g. Osaka). For example, if you enter "yoyaku picture" (in one byte translation: "ó\ñÒ") in a string cell, it becomes "ran picture" (in one byte translation: "óñÒ"). To avoid this, you may want first to change the font of the table to a Roman font, type or paste the word, and add manually a "\" to the "\" of the problematic character. For example, for the word "yoyaku picture" (in Roman font: "ó\ñÒ") should be written "ó\\ñÒ"; if you change the font to a Japanese font, it is shown in this way: "yoyaku2 picture"...

[B]
The patched script html.refGlossary ()is necessary when you render a script object containing both string literals in Japanese and glossary entries in Japanese. Here is an example (I borrow this snippet from a testing code written by Philip Suh):

html.data.page.title = "Script 1"
html.data.page.language = "Japanese"
html.data.page.subtext = "yoyaku picture"
local
   htmltext = ""
   s 
   br = "<br>"
   hr = "<hr>"
on add (s)
   htmltext = htmltext + s

add ("tadaimanojikan picture" + clock.now () + "<br>") add (br + br + "<h3>Subtext is:" + html.data.page.subtext + "</h3>") add ("Raw subtext in a script object: yoyaku picture<br>") add ("a glossary entry: "yumiglos picture"<br>"`) <<glossary entries with "problematic characters" are encoded, and decoded in html.refGlossary () ...........

To render a script object containing string literals in Japanese, we have to encode all the lines of the script, because the string literals in Japanese could contain characters having "\" or "«" at the second byte, which will cause problems. MWU.twobyteFixer () has a very useful optional parameter which is "excludeQuoted" -- when it is set to true , all the strings enclosed between double-quotes are excluded from the encoding: this is necessary, because string.processHTMLMacros () (and MWU.NProcessMacros () as well) passes all the strings enclosed between double-quotes directly to html.refGlossary (), without dropping out the "\" in these strings.

However, when dealing with the script object containing string literals in Japanese, that optional parameter must be set to false , because these string literals are always enclosed between double-quotes (e.g. the line "add ("Raw subtext in a script object: yoyaku picture<br>")" in the example script quoted above). This means that even the glossary entries are encoded by this process (see in the script "user.html.language.Japanese.scriptPageFilter ()", this line:

str = MWU.TwoByteFixer (str, "«", false, theCode)

).

As the glossary entries are encoded, they must be decoded before being refered to the glossary tables. This is why "the patched script html.refGlossary is necessary when you render a script object containing string literals in Japanese and also glossary entries in Japanese".

[C]
As it is well known, a line in outline objects is limited to 255 characters.. We had to encode outline objects, because some outline renderers (such as user.html.renderers.newCulture () which has this line:

s = string.nthfield (s, «, 1)

) truncate the lines when they encounter the character "«" ("thinking" that what is after this character is comments).

We used to encode the outline objects with MWU.twobyteFixer (), which encodes "«" as "{char(0xC7)}" -- that means that one character was replaced by 11 characters. If a line contains many Japanese characters containing "«", at the second byte, and if that line is already very long (near to the 255 limit), the encoded line was becaming too long...! and we had some inexpected result...

This problem is now solved thanks to the new verb introduced in the MWU suite, i.e MWU.nthField (), which we call from string.nthField () as well as from string.commentDelete () and toys.commentDelete ().

Anyway, here is the list of the Japanese characters containing "«" at the second byte:

charlist2 picture

"Translated" in the one byte characters, these characters are:

Ç«, É«, Ü«, à«, â«, ä«, ã«, å«, ç«, é«, è«, ê«, ë«, í«, ì«, î«, ï«, ñ«, ó«, ò«, ô«, ö«, õ«, ú«, ù«, û«, ü«, [dbl dagger]«, ·«, [base ']«, [base "]«, [per thou]«, «, Ê«, Á«, Ë«, È«, Ì«

[D]
The script user.html.language.getLanguageDirective () may cause some problems when dealing with some script objects or embedded scripts in outline objects, if they define or undefine the language directive. Here is an example:


local (language = false)
.....
if language == true
	html.data.page.language = "Japanese" « if "language" is true, set language directive
.....

In such a case, user.html.language.getLanguageDirective () will see only the line:

 html.data.page.language = "Japanese" « if "language" is true, set language directive

so that the language directive will be set to "Japanese", although the logic of the script itself may not want this. In such a case, you should write the script in this way:


local (language = false)
.....
if language == true
	html.data.page.language = "Japanese" « if "language" is true, set language directive
else
	if defined (html.data.page.language)
		delete (@html.data.page.language)
	.....


So, the only directive to remember, that you can defined as "user.html.prefs" or in each web site table or each web page, is:

#Language "Japanese"


Some technical points...:


Anyway, if you have any problems using this set of scripts, please write me -- and I could probably fix them.
And anyway, this is still an alpha version -- so please use this script set with caution!

Thank you in advance for any bug report, suggestions for improvements, and any feedback!

Nobumi Iyanaga Tokyo, Japan

n-iyanag@ppp.bekkoame.or.jp Wed, Sep 3, 1997 at 12:15:45 by NI


Credits:

The MUW suite has been written by Hideaki Iimori.

Our user.html.language.getLanguageDirective () is a modified version of directive.get () written by Jan Storms <jstorms@pi.net>.

The code of user.html.language.systemScriptCode () has been written by Iimori Hideaki.

I have been very helped by Yuichiro Sugiura and Philip Suh in the process of making this suite.


Version History

First release 		Wed, Sep 3, 1997

Second release		Thu, Sep 4, 1997

Third release		Fri, Sep 5, 1997

Fourth release		Fri, Sep 5, 1997

Fifth release		Thu, Sep 18, 1997

Return to Frontier Main Page


Mail to Nobumi Iyanaga


frontierlogo picture

This page was last built with Frontier on a Macintosh on Thu, Sep 18, 1997 at 18:26:07. Thanks for checking it out! Nobumi Iyanaga