Multilingual Web Authoring Kit for Frontier

Part of Nobumi Iyanaga's website. n-iyanag@ppp.bekkoame.or.jp. 9/18/97.

Multilingual Web Authoring Kit for Frontier

user.html.language:
*ReadMe*

[I am preparing a Japanese version of this ReadMe. Please wait some days...!]
To download "Multilingual Web Authoring Kit for Frontier", please click here (78 KB).
For the users who use web browsers that can read Japanese, please set the character set to Latin 1 or MacRoman to read this page.
Introduction:
This is an alpha version of a new set of scripts allowing the users to create web pages in Japanese and probably in any other double byte coded languages in the Frontier environment.
These scripts use the MWU ("Multilingual/Multbyte Web-authoring Utilities") suite created by Hideaki Iimori <iimori@libri.bekkoame.or.jp>, that you can download from his web site (I included in this package the latest MWU suite for the convinience of the users):
http://www.bekkoame.or.jp/~iimori/sw/MWUsuites.html
I have been writing a script set called "Japanese Web-Authoring Kit for Frontier". Now, the MWU suite is so much improved (with the version 1.1b2), that we have almost nothing to do to make web pages in any language supported by the Mac OS -- all the essential parts of the job are done by this suite; and we only have to call the due scripts in the MWU suite at the due places. So, I decided to rewrite entirely this script set, call it "Multlingual Web-authoring Kit for Frontier", and place it at a higher place, user.html.language.
Iimori-san added another UCMD in the MWU suite version 1.1b3, which is "preEvaluate ()".
I also include people.[user.initials].print () script which is used in user.html.language.test () script (new in alpha 5 version).
For those who have used my earlier "Japanese Web-Authoring Kit for Frontier": user.html.macros.language is no longer necessary. If you want, you can delete it simply. But you can keep it as well: I think that it will not disturb this new kit to work. But you have to paste the patched scripts anew.
The MWU suite contains now five important UCMDs, which are all "WorldScript savvy":
One important concept in these UCMDs is the parameter called "scriptCode": it is a number representing the script code of each language supported by the Mac OS: e.g. 0 for the Roman (U.S.) System, 1 for the Japanese System, 2 for the Traditional Chinese System, etc. One of these number is passed to the MWU scripts as their last required parameter. -1 represents the script code of the current System.

MWU.NProcessMacros (s, flaps, activeURLs, clayCompatibility, embeddedcode, scriptCode)

is a "WorldScript savvy" version of string.processHtmlMacros (flaps, activeURLs, clayCompatibility, osacallback). The parameter "embeddedcode" is the equivalent of "osacallback" in string.processHtmlMacros () and must be the address of an OSA compiled object. MWU.NProcessMacros () can now evaluate the macros having one or more string literals parameters in double byte code characters (e.g. ), supplying automatically a supplementary "\" if any of the string literals in two byte code characters contains a "\" at the second byte.

MWU.countFields (s, chdelim, scriptCode)

is a "WorldScript savvy" version of string.countFields (s, chdelim).

MWU.nthField (s, chdelim, n, scriptCode)

is a "WorldScript savvy" version of string.nthField (s, chdelim, n).

MWU.preEvaluate (s, scriptCode)

adds a supplementary "\" if the string s is in double byte code contains a "\".

MWU.TwoByteFixer (s, escapeChars = "\\{}@«", excludeQuoted = true, scriptCode)

This script encodes four characters, "\", "{", "}", "@" and "«", that cause problems in the process of html rendering in Frontier html suite (actually, you can specify any character to encode). A Japanese word like "" ("ó\ñÒ" in one byte string) is encoded in this format: "{char(0x97)+char(0x5c)}". As you can see, the first character, containing the problemetic character "\", is encoded in the Frontier's html macro style, so that when it is passed to the script html.processMacros (), the original characters are restored. The parameter "escapeChars" is an optional parameter, default being the four problematic characters, but it can be any character. The parameter "excludeQuoted" is also an optional parameter, true or false, default being true. When this parameter is set to true, the script will ignore all strings enclosed between double-quotes. This is very useful, because the double-quoted glossary entries must not be encoded to be refered to the glossary tables.

Iimori-san is also the author of some other important UCMDs, such as jcode suite:

http://www.bekkoame.or.jp/~iimori/sw/jcodeUCMD.html
or OSAXen such as TextInfo OSAX:

http://www.bekkoame.or.jp/~iimori/sw/TextInfoOSAX.html
Transliterate OSAX:

http://www.bekkoame.or.jp/~iimori/sw/TransliterateOSAX.html
WrapText OSAX:

http://www.bekkoame.or.jp/~iimori/sw/WrapTextOSAX.html
which are all "WorldScript savvy".

This package needs also the regex suite. It's latest version can be downloaded from:

http://www2.gol.com/users/filsa/downloads.html (US)
http://www.udena.ch/LTODBS/suites/regex.sit.hqx (Europe), or
ftp://frontier:@flash.dmc.com/dmg/suites-regex-112b5.hqx
For the background of the problems related to the web -authoring in double-byte languages in Frontier, please refer to the web site started by Philip Suh (on Aug 6, 1997), at:

http://www2.gol.com/users/filsa/frontier/polyglot/
This script set supports for now only the authoring of Japanese web pages. But I think that any other two byte encoded languages (Simplified Chinese, Traditional Chinese and Korean) can be supported almost with the same scripts.

What are the advantages of the "Language Framework"?

From the point of vue of the web authoring in the Frontier environment, the main advantage is that you can make a multilingual web site with the language directive. You can define your language directive in each of your pages, so that you can make for example a web site with Japanese page, English page, French page or Traditional Chinese page, etc. -- On the contrary, if you use web site table's pageFilter or finalFilter to deal with some special problems related for example with the Japanese page rendering, your entire web-site will be specialized for Japanese pages, etc.

On the other hand, if the users define the language directive, they can design their web site, write their own pageFilter or finalFilter, etc., without being bothered by the problems related to the rendering of pages in one or other languages.

Finally, the MWU suite provides some important verbs in Frontier (such as string.countFields () or string.nthField ()) which are WorldScript savvy. The Language Framework allows the users to use these verbs with the script code that they need for their work. We can foresee here the possibility of making all the Frontier environment WorldScript savvy....

How to use...:
First, paste the patched scripts, that you will find in user.html.language.patchedScripts to their respective places:

user.html.language.patchedScripts.buildObject at
html.buildObject

The patched script calls first user.html.language.getLanguageDirective () before any processing, which seeks in the rendered object, its parent tables and in the user.html.prefs table if the "language" directive is defined, and if it is, it enters the value in html.data.page.language. Then, if the language directive is defined, it calls user.html.language.[language name].pageFilter () just before the call to pageFilter (), and user.html.language.[language name].finalFilter (), just before the call to finalFilter (). Finally, it deletes html.data.page.language and puts its value in user.html.language.lastLanguage (Japanese is the only language supported for now...)

user.html.language.patchedScripts.commentDelete at
string.commentDelete

The patched script calls the script string.nthField (s, "«", 1), which will in its turn calls MWU.nthField (s, "«", 1, scriptCode).

user.html.language.patchedScripts.countFields at
string.countFields

The patched script calls MWU.countFields (), with the appropriate script code, which is given by the script user.html.language.scriptCode (). This script is not used from our scripts, but is certainly useful for the multilingual web-authoring in Frontier.

user.html.language.patchedScripts.getPref at
html.getPref
The patched script returns the "string.lower'ed" form of the directive value if it finds it in the html.data.page table; on the other hand, it allows the form "on/off" to mean "true/false". It adds also two "default" prefs, which are:
"charset" << added Wed, Jan 1, 1997 at 11:48:47 PM by NI
   return ("iso-8859-1")
"includeMetaCharset" << added Wed, Sep 3, 1997 at 11:38:28 by NI
   return (false)
user.html.language.patchedScripts.nthField at
string.nthField

With this patch, string.nthField (s, chdelim, n) calls MWU.nthField (s, chdelim, n, scriptCode), the scriptCode parameter being passed by user.html.language.scriptCode ().

user.html.language.patchedScripts.processMacros at
html.processMacros

The patched script sees if the language directive is defined; if it is, it passes the string parameter to MWU.NProcessMacros (), with the due scriptCode parameter. If the scriptCode is "0" (i.e. roman), and if the isoFilter pref is set to true, it returns the "isoFilter'ed" string; otherwise, if the language directive is defined, or if it is not "roman", the isoFilter is omitted.

user.html.language.patchedScripts.refGlossary at
html.refGlossary

The patched script is changed so that the users can use macros inside the glossary entries. The macros must be written as usual in the format "{xxx}" [without quotes!]. But the escaping character, "\", before "{" is not supported. This script is necessary if you render a script object having both string literals in Japanese, and glossary entries in Japanese. And anyway, it can be useful.

user.html.language.patchedScripts.renderObject at
html.data.standardMacros.renderObject

If the rendered object it a script object, the patched script sees if the language directive is defined, and if it is, it passes the object's address to the script user.html.language.[language name].scriptPageFilter (adr).

user.html.language.patchedScripts.runDirective at
html.runDirective

The patched script enters in the html.data.page.title the title directive as a string literal, without evaluating it, at the condition that the title directive is given as a string literal (enclosed between double-quotes). If a title in Japanese contains some characters having "\" at the second byte, the backslash is dropped out when passed through "evaluate ()". This patched script avoids this problem.
or (preferably?) [new in the alpha 5 version]

user.html.language.patchedScripts.runDirective2 at
html.runDirective
The patched script calls MWU.preEvaluate () just before it evaluates the directive. If a directive in Japanese contains some characters having "\" at the second byte, the backslash is dropped out when passed through "evaluate ()". MWU.preEvaluate () adds a supplementary "\\" in such cases. This script is more general than the earlier one (user.html.language.patchedScripts.runDirective), and from the point of view of the speed, it seems that this script is more or less the same as the other. -- Please see the version history, at the bottom of this page.
Of course, I strongly recommend to keep the original scripts in the html suite, toys.commentDelete (), string.countField () and string.nthField (). You can keep them for example as suites.html.["buildObject backup"], etc.

Now, to create web pages in Japanese, you will have to define a new directive in each of your web site table, or web page, or in the user.html.prefs table:
#Language "Japanese"
That's all.
Now, you should be able to write Japanese text in your page text, in your template, glossary entries, titles, and use macros that insert Japanese text in your page, just as if you were creating English or any other roman language web pages in Frontier. You will be able to write your own "pageFilter ()", "finalFilter ()" or scripts in "tools" table.
Important notes:
[A]
It is recommended in general to not use string literals in Japanese in macros or script objects. The best way is to put them in some wpText objects, and call them in this way:

string (wpTextObject containing string literals)
-- e.g. it is better to write a macro like this:
{popText ("", "Hello!", 20)}
in this way:
{popText (string (myWord), "Hello!", 20)}
where "myWord " is the name of a wpText object in which you would have written the word "".
This problem is briefly explained in Iimori-san's Frontier related web pages, especially at:
<http://www.bekkoame.or.jp/~iimori/progtips/frontierj2.html>
In fact, the string literals that raise problems are only those which contain "" at the second byte. Here is the list of these characters:

("translated" into one byte, these characters are:
Å\, É\, Ñ\, Ü\, â\, ä\, ã\, å\, ç\, é\, è\, ê\, ë\, í\, ì\, î\, ï\, ñ\, ó\, ò\, ô\, ö\, õ\, ú\, ù\, û\, ü\, [dbl dagger]\, ·\, [base ']\, [base "]\, [per thou]\, Â\, Ê\, Á\, Ë\, È\, Í\, Î\ )
MWU.NProcessMacros () (from its version 1.1b1 onward) supplies automatically a supplementary "\" to these characters, so that the problem is solved for the macros. Nevertheless, it is a good habit, IMHO, to write macros in this way:
{popText (string (myWord), "Hello!", 20)}
On the other hand, if you type or paste simply these characters in string cells, the second byte "\" are automatically dropped when the cell is updated by hitting the enter key or closing the table window, so that they become corrupt. To see that, you only have to change the font of the table to some Japanese font (e.g. Osaka). For example, if you enter "" (in one byte translation: "ó\ñÒ") in a string cell, it becomes "" (in one byte translation: "óñÒ"). To avoid this, you may want first to change the font of the table to a Roman font, type or paste the word, and add manually a "\" to the "\" of the problematic character. For example, for the word "" (in Roman font: "ó\ñÒ") should be written "ó\\ñÒ"; if you change the font to a Japanese font, it is shown in this way: ""...
[B]
The patched script html.refGlossary ()is necessary when you render a script object containing both string literals in Japanese and glossary entries in Japanese. Here is an example (I borrow this snippet from a testing code written by Philip Suh):
html.data.page.title = "Script 1"
html.data.page.language = "Japanese"
html.data.page.subtext = ""
local
   htmltext = ""
   s 
   br = "<br>"
   hr = "<hr>"
on add (s)
   htmltext = htmltext + s

add ("" + clock.now () + "<br>")
add (br + br + "<h3>Subtext is:" + html.data.page.subtext + "</h3>")
add ("Raw subtext in a script object: <br>")
add ("a glossary entry: ""<br>"`)
<<glossary entries with  "problematic characters" are encoded, and decoded in html.refGlossary ()
...........
To render a script object containing string literals in Japanese, we have to encode all the lines of the script, because the string literals in Japanese could contain characters having "\" or "«" at the second byte, which will cause problems. MWU.twobyteFixer () has a very useful optional parameter which is "excludeQuoted" -- when it is set to true , all the strings enclosed between double-quotes are excluded from the encoding: this is necessary, because string.processHTMLMacros () (and MWU.NProcessMacros () as well) passes all the strings enclosed between double-quotes directly to html.refGlossary (), without dropping out the "\" in these strings.
However, when dealing with the script object containing string literals in Japanese, that optional parameter must be set to false , because these string literals are always enclosed between double-quotes (e.g. the line "add ("Raw subtext in a script object: <br>")" in the example script quoted above). This means that even the glossary entries are encoded by this process (see in the script "user.html.language.Japanese.scriptPageFilter ()", this line:
str = MWU.TwoByteFixer (str, "«", false, theCode)
).
As the glossary entries are encoded, they must be decoded before being refered to the glossary tables. This is why "the patched script html.refGlossary is necessary when you render a script object containing string literals in Japanese and also glossary entries in Japanese".
[C]
As it is well known, a line in outline objects is limited to 255 characters.. We had to encode outline objects, because some outline renderers (such as user.html.renderers.newCulture () which has this line:

s = string.nthfield (s, «, 1)
) truncate the lines when they encounter the character "«" ("thinking" that what is after this character is comments).
We used to encode the outline objects with MWU.twobyteFixer (), which encodes "«" as "{char(0xC7)}" -- that means that one character was replaced by 11 characters. If a line contains many Japanese characters containing "«", at the second byte, and if that line is already very long (near to the 255 limit), the encoded line was becaming too long...! and we had some inexpected result...
This problem is now solved thanks to the new verb introduced in the MWU suite, i.e MWU.nthField (), which we call from string.nthField () as well as from string.commentDelete () and toys.commentDelete ().
Anyway, here is the list of the Japanese characters containing "«" at the second byte:

"Translated" in the one byte characters, these characters are:
Ç«, É«, Ü«, à«, â«, ä«, ã«, å«, ç«, é«, è«, ê«, ë«, í«, ì«, î«, ï«, ñ«, ó«, ò«, ô«, ö«, õ«, ú«, ù«, û«, ü«, [dbl dagger]«, ·«, [base ']«, [base "]«, [per thou]«, Â«, Ê«, Á«, Ë«, È«, Ì«
But the same problem still remains for the script objects if they contain string literals in double byte code, because they are "outlines" in their nature. The character "«" must be encoded if it is at the second byte of a double byte character, because in the execution of scripts, Frontier interprets it as a sign for comment. On the other, we must add a supplementary "\", if there are string literals in double byte code which contain this character. This is the meaning of the following lines in the script user.html.language.Japanese.scriptPageFilter ():
if str contains "\"" AND str contains "\"
   str = MWU.TwoByteFixer (str, "«", false, theCode) << new in v. alpha 5
   str = MWU.preEvaluate (str, theCode) << new in v. alpha 5
   .....
By the second line, the character "«" is encoded as \"«", i.e. one character replaced by 11 characters. And this problem is not solved in the current implementation of our scripts. Although we encode only these characters, the line can exceed the limit of 255 characters. So, in this case, please avoid writing long lines with string literals in Japanese in the script objects !!
[D]
The script user.html.language.getLanguageDirective () may cause some problems when dealing with some script objects or embedded scripts in outline objects, if they define or undefine the language directive. Here is an example:
local (language = false)
.....
if language == true
	html.data.page.language = "Japanese" « if "language" is true, set language directive
.....
In such a case, user.html.language.getLanguageDirective () will see only the line:
 html.data.page.language = "Japanese" « if "language" is true, set language directive
so that the language directive will be set to "Japanese", although the logic of the script itself may not want this. In such a case, you should write the script in this way:
local (language = false)
.....
if language == true
	html.data.page.language = "Japanese" « if "language" is true, set language directive
else
	if defined (html.data.page.language)
		delete (@html.data.page.language)
	.....
So, the only directive to remember, that you can defined as "user.html.prefs" or in each web site table or each web page, is:
#Language "Japanese"

Some technical points...:

It is important to see if the language directive is defined at the first step of the page rendering process, because html.runDirective () which enters the directives, has at the first line a call to toys.commentDelete (). If a directive in Japanese (or any other double byte language) contains any characters having "«" at the second byte, this directive would be truncated, if the script code were not correctly returned by the script user.html.language.scriptCode () before that "toys.commentDelete ()". This is the task of user.html.language.getLanguageDirective () which calls user.html.language.getNamedDirective () (This is the main place in this script set where the regex suite is used. This is why, we must set regex.options.threadFriendly to true at this point, saving its original value in user.html.language.regexOptionTemp, then restoring the original value at the last step, in the finalFilter ()).

Our user.html.language.Japanese.pageFilter () has now only to set to false the value of user.html.prefs.includeMetaCharset and save temporaly its value in user.html.language.Japanese.tempIincludeMetaCharset (that value is restored by our user.html.language.Japanese.finalFilter ().) In fact, the Japanese Mac OS uses S-JIS code, so that the metaCharSet should be defined as "x-sjis", but as there are few browsers that support the charset mata tag, we simply set includeMetaCharset to false.
On the other hand, html.data.standardMacros.pageheader () refers directly to the value of user.html.prefs.includeMetaCharset to decide if the charset meta tag is to be included (see the lines in this script:
if user.html.prefs.includeMetaCharset
   local (charset = html.getPref ("charset"))
   add ("<meta http-equiv="content-type" content="text/html; charset=" + charset + "">r")
.) This is a bug IMHO: we should be able to define in each of our web site tables or web pages the value of includeMetaCharset, and so, the lines above should be:
if html.getPref ("includeMetaCharset")
   local (charset = html.getPref ("charset"))
   add ("<meta http-equiv="content-type" content="text/html; charset=" + charset + "">r")
and accordingly, html.getPref () should return a "default" "includeMetaCharset" value, which is false (cf. our patched html.getPref ()).
Our user.html.language.Japanese.finalFilter () restores the values of user.html.prefs.includeMetaCharset and regex.options.threadFriendly; and if the renderedText still contains "{char(0x..)+char(0x..)}" type of macros, it attempts to evaluate them (without refering to the tool tables, etc.).

Our script user.html.language.scriptCode () is probably the most important one in this script set. It is somewhat the equivalent of html.getPref ("Language"), but it supports script objects, and if it fails to get any defined language directive, it returns -1. This is the value of the current System script code.

Our script user.html.language.systemScriptCode () attempts to get to script code of the System ('itlc' resource), ant put it in user.systemScriptCode. But the system script code can be changed dynamically by applications, so it is safer to use -1 as the default script code to pass to the verbs in MWU suite.

Anyway, if you have any problems using this set of scripts, please write me -- and I could probably fix them.
And anyway, this is still an alpha version -- so please use this script set with caution!
Thank you in advance for any bug report, suggestions for improvements, and any feedback!
Nobumi Iyanaga Tokyo, Japan
n-iyanag@ppp.bekkoame.or.jp Wed, Sep 3, 1997 at 12:15:45 by NI

Credits:
The MUW suite has been written by Hideaki Iimori.
Our user.html.language.getLanguageDirective () is a modified version of directive.get () written by Jan Storms <jstorms@pi.net>.
The code of user.html.language.systemScriptCode () has been written by Iimori Hideaki.
I have been very helped by Yuichiro Sugiura and Philip Suh in the process of making this suite.

Version History
First release 		Wed, Sep 3, 1997
Package with MWU suite (v. 1.1b2) by Iimori-san
Second release		Thu, Sep 4, 1997
Two lines have been added in the script user.html.language.getLanguageDirective (), to avoid a rare error.
Third release		Fri, Sep 5, 1997
Two lines have been modified again in the script user.html.language.getLanguageDirective (), to avoid the same error.
user.html.language.scriptCode () has been simplified for a faster rendering.
Fourth release		Fri, Sep 5, 1997
Modified again in the script user.html.language.getLanguageDirective (), and user.html.language.getNamedDirective () to avoid more errors.
Added Important Notes: [D] to warn the weakness of the scriptuser.html.language.getLanguageDirective () in some rare cases.
Fifth release		Thu, Sep 18, 1997
Added at the bottom of the script user.html.language.patchedScripts.buildObject () these lines:
if language != true << Added Mon, Sep 8, 1997 by NI
	user.html.language.lastLanguage = language
	if defined (html.data.page.language)
		delete (@html.data.page.language)
I am not sure if this modification is good or needed.
Added user.html.language.patchedScripts.runDirective2 (), which uses the new MWU.preEvaluate () verb. This new patched "runDirective ()" is more general and lets the users put any directives in Japanese (or any double byte code). From the point of view of speed, runDirective2 () and the earlier "runDirective ()" seem to be roughly equal. But pratically, the only directive that can be in double byte code is the title directive. So I would leave the users to choose one or other "runDirective ()"
Modified some lines in user.html.language.Japanese.scriptPageFilter (). It uses the new MWU.preEvaluate (), so that this script encodes now only the character "«"
Re-written some parts of the ReadMe
Added people.[user.initials].print () script to the package
Return to Frontier Main Page
Mail to Nobumi Iyanaga

This page was last built with Frontier on a Macintosh on Thu, Sep 18, 1997 at 18:26:07. Thanks for checking it out! Nobumi Iyanaga

Part of Nobumi Iyanaga's website. n-iyanag@ppp.bekkoame.or.jp. 9/18/97.

Multilingual Web Authoring Kit for Frontier

user.html.language: *ReadMe*

This page was last built with Frontier on a Macintosh on Thu, Sep 18, 1997 at 18:26:07. Thanks for checking it out! Nobumi Iyanaga

user.html.language:
ReadMe