user.html.language:
*ReadMe*[I am preparing a Japanese version of this ReadMe. Please wait some days...!]
To download "Multilingual Web Authoring Kit for Frontier", please click here (78 KB).
For the users who use web browsers that can read Japanese, please set the character set to Latin 1 or MacRoman to read this page.
Introduction:
This is an alpha version of a new set of scripts allowing the users to create web pages in Japanese and probably in any other double byte coded languages in the Frontier environment.These scripts use the
MWU
("Multilingual/Multbyte Web-authoring Utilities") suite created by Hideaki Iimori <iimori@libri.bekkoame.or.jp>, that you can download from his web site (I included in this package the latestMWU
suite for the convinience of the users):http://www.bekkoame.or.jp/~iimori/sw/MWUsuites.html
I have been writing a script set called "Japanese Web-Authoring Kit for Frontier". Now, the
MWU
suite is so much improved (with the version 1.1b2), that we have almost nothing to do to make web pages in any language supported by the Mac OS -- all the essential parts of the job are done by this suite; and we only have to call the due scripts in theMWU
suite at the due places. So, I decided to rewrite entirely this script set, call it "Multlingual Web-authoring Kit for Frontier", and place it at a higher place,user.html.language
.Iimori-san added another UCMD in the MWU suite version 1.1b3, which is
"preEvaluate ()
".I also include
people.[user.initials].print ()
script which is used inuser.html.language.test ()
script (new in alpha 5 version).For those who have used my earlier "Japanese Web-Authoring Kit for Frontier":
user.html.macros.language
is no longer necessary. If you want, you can delete it simply. But you can keep it as well: I think that it will not disturb this new kit to work. But you have to paste the patched scripts anew.The
MWU
suite contains now five important UCMDs, which are all "WorldScript savvy":One important concept in these UCMDs is the parameter called "
scriptCode
": it is a number representing the script code of each language supported by the Mac OS: e.g. 0 for the Roman (U.S.) System, 1 for the Japanese System, 2 for the Traditional Chinese System, etc. One of these number is passed to theMWU
scripts as their last required parameter. -1 represents the script code of the current System.
MWU.NProcessMacros (s, flaps, activeURLs, clayCompatibility, embeddedcode, scriptCode)
is a "WorldScript savvy" version ofstring.processHtmlMacros (flaps, activeURLs, clayCompatibility, osacallback)
. The parameter "embeddedcode" is the equivalent of "osacallback" instring.processHtmlMacros ()
and must be the address of an OSA compiled object.MWU.NProcessMacros ()
can now evaluate the macros having one or more string literals parameters in double byte code characters (e.g.), supplying automatically a supplementary "\" if any of the string literals in two byte code characters contains a "\" at the second byte.
MWU.countFields (s, chdelim, scriptCode)
is a "WorldScript savvy" version ofstring.countFields (s, chdelim)
.
MWU.nthField (s, chdelim, n, scriptCode)
is a "WorldScript savvy" version ofstring.nthField (s, chdelim, n)
.
MWU.preEvaluate (s, scriptCode)
adds a supplementary "\" if the strings
is in double byte code contains a "\".
MWU.TwoByteFixer (s, escapeChars = "\\{}@«", excludeQuoted = true, scriptCode)
This script encodes four characters, "\", "{", "}", "@" and "«", that cause problems in the process of html rendering in Frontier html suite (actually, you can specify any character to encode). A Japanese word like "" ("ó\ñÒ" in one byte string) is encoded in this format: "{char(0x97)+char(0x5c)}
". As you can see, the first character, containing the problemetic character "\", is encoded in the Frontier's html macro style, so that when it is passed to the script
html.processMacros ()
, the original characters are restored. The parameter "escapeChars
" is an optional parameter, default being the four problematic characters, but it can be any character. The parameter "excludeQuoted
" is also an optional parameter,true
orfalse
, default beingtrue
. When this parameter is set to true, the script will ignore all strings enclosed between double-quotes. This is very useful, because the double-quoted glossary entries must not be encoded to be refered to the glossary tables.
Iimori-san is also the author of some other important UCMDs, such as jcode suite:
or OSAXen such as TextInfo OSAX:
Transliterate OSAX:
WrapText OSAX:
which are all "WorldScript savvy".
This package needs also the regex suite. It's latest version can be downloaded from:
- http://www2.gol.com/users/filsa/downloads.html (US)
- http://www.udena.ch/LTODBS/suites/regex.sit.hqx (Europe), or
- ftp://frontier:@flash.dmc.com/dmg/suites-regex-112b5.hqx
For the background of the problems related to the web -authoring in double-byte languages in Frontier, please refer to the web site started by Philip Suh (on Aug 6, 1997), at:
This script set supports for now only the authoring of Japanese web pages. But I think that any other two byte encoded languages (Simplified Chinese, Traditional Chinese and Korean) can be supported almost with the same scripts.
What are the advantages of the "Language Framework"?
- From the point of vue of the web authoring in the Frontier environment, the main advantage is that you can make a multilingual web site with the language directive. You can define your language directive in each of your pages, so that you can make for example a web site with Japanese page, English page, French page or Traditional Chinese page, etc. -- On the contrary, if you use web site table's pageFilter or finalFilter to deal with some special problems related for example with the Japanese page rendering, your entire web-site will be specialized for Japanese pages, etc.
- On the other hand, if the users define the language directive, they can design their web site, write their own pageFilter or finalFilter, etc., without being bothered by the problems related to the rendering of pages in one or other languages.
- Finally, the
MWU
suite provides some important verbs in Frontier (such asstring.countFields ()
orstring.nthField ()
) which are WorldScript savvy. The Language Framework allows the users to use these verbs with the script code that they need for their work. We can foresee here the possibility of making all the Frontier environment WorldScript savvy....
How to use...:
First, paste the patched scripts, that you will find in user.html.language.patchedScripts to their respective places:
- user.html.language.patchedScripts.buildObject at
html.buildObjectThe patched script calls firstuser.html.language.getLanguageDirective ()
before any processing, which seeks in the rendered object, its parent tables and in theuser.html.prefs
table if the "language" directive is defined, and if it is, it enters the value inhtml.data.page.language
. Then, if the language directive is defined, it callsuser.html.language.[language name].pageFilter ()
just before the call topageFilter ()
, anduser.html.language.[language name].finalFilter ()
, just before the call tofinalFilter ()
. Finally, it deleteshtml.data.page.language
and puts its value inuser.html.language.lastLanguage
(Japanese is the only language supported for now...)
- user.html.language.patchedScripts.commentDelete at
string.commentDeleteThe patched script calls the scriptstring.nthField (s, "«", 1)
, which will in its turn callsMWU.nthField (s, "«", 1, scriptCode)
.
- user.html.language.patchedScripts.countFields at
string.countFieldsThe patched script callsMWU.countFields ()
, with the appropriate script code, which is given by the scriptuser.html.language.scriptCode ()
. This script is not used from our scripts, but is certainly useful for the multilingual web-authoring in Frontier.
- user.html.language.patchedScripts.getPref at
html.getPrefThe patched script returns the "string.lower'ed" form of the directive value if it finds it in thehtml.data.page
table; on the other hand, it allows the form "on/off" to mean "true/false". It adds also two "default" prefs, which are:
"charset" << added Wed, Jan 1, 1997 at 11:48:47 PM by NI return ("iso-8859-1") "includeMetaCharset" << added Wed, Sep 3, 1997 at 11:38:28 by NI return (false)
- user.html.language.patchedScripts.nthField at
string.nthFieldWith this patch,string.nthField (s, chdelim, n)
callsMWU.nthField (s, chdelim, n, scriptCode)
, thescriptCode
parameter being passed byuser.html.language.scriptCode ()
.
- user.html.language.patchedScripts.processMacros at
html.processMacrosThe patched script sees if the language directive is defined; if it is, it passes the string parameter toMWU.NProcessMacros ()
, with the duescriptCode
parameter. If thescriptCode
is "0" (i.e. roman), and if theisoFilter
pref is set totrue
, it returns the "isoFilter'ed" string; otherwise, if the language directive is defined, or if it is not "roman", theisoFilter
is omitted.
- user.html.language.patchedScripts.refGlossary at
html.refGlossaryThe patched script is changed so that the users can use macros inside the glossary entries. The macros must be written as usual in the format "{xxx}" [without quotes!]. But the escaping character, "\", before "{" is not supported. This script is necessary if you render a script object having both string literals in Japanese, and glossary entries in Japanese. And anyway, it can be useful.
- user.html.language.patchedScripts.renderObject at
html.data.standardMacros.renderObjectIf the rendered object it a script object, the patched script sees if the language directive is defined, and if it is, it passes the object's address to the scriptuser.html.language.[language name].scriptPageFilter (adr)
.
- user.html.language.patchedScripts.runDirective at
html.runDirectiveThe patched script enters in thehtml.data.page.title
the title directive as a string literal, without evaluating it, at the condition that the title directive is given as a string literal (enclosed between double-quotes). If a title in Japanese contains some characters having "\" at the second byte, the backslash is dropped out when passed through "evaluate ()
". This patched script avoids this problem.or (preferably?) [new in the alpha 5 version]
- user.html.language.patchedScripts.runDirective2 at
html.runDirectiveThe patched script callsMWU.preEvaluate ()
just before it evaluates the directive. If a directive in Japanese contains some characters having "\" at the second byte, the backslash is dropped out when passed through"evaluate ()". MWU.preEvaluate ()
adds a supplementary "\\" in such cases. This script is more general than the earlier one (user.html.language.patchedScripts.runDirective), and from the point of view of the speed, it seems that this script is more or less the same as the other. -- Please see the version history, at the bottom of this page.Of course, I strongly recommend to keep the original scripts in the html suite,
toys.commentDelete ()
,string.countField ()
andstring.nthField ()
. You can keep them for example assuites.html.["buildObject backup"]
, etc.
Now, to create web pages in Japanese, you will have to define a new directive in each of your web site table, or web page, or in the user.html.prefs table:
#Language "Japanese"
That's all.
Now, you should be able to write Japanese text in your page text, in your template, glossary entries, titles, and use macros that insert Japanese text in your page, just as if you were creating English or any other roman language web pages in Frontier. You will be able to write your own "
pageFilter ()
", "finalFilter ()
" or scripts in "tools
" table.Important notes:
[A]
It is recommended in general to not use string literals in Japanese in macros or script objects. The best way is to put them in some wpText objects, and call them in this way:
-- e.g. it is better to write a macro like this:string (wpTextObject containing string literals)
{popText ("
", "Hello!", 20)}
in this way:
{popText (string (myWord), "Hello!", 20)}
where "
myWord
" is the name of a wpText object in which you would have written the word "".
This problem is briefly explained in Iimori-san's Frontier related web pages, especially at:
<http://www.bekkoame.or.jp/~iimori/progtips/frontierj2.html>In fact, the string literals that raise problems are only those which contain "" at the second byte. Here is the list of these characters:
("translated" into one byte, these characters are:
Å\, É\, Ñ\, Ü\, â\, ä\, ã\, å\, ç\, é\, è\, ê\, ë\, í\, ì\, î\, ï\, ñ\, ó\, ò\, ô\, ö\, õ\, ú\, ù\, û\, ü\, [dbl dagger]\, ·\, [base ']\, [base "]\, [per thou]\, Â\, Ê\, Á\, Ë\, È\, Í\, Î\ )
MWU.NProcessMacros ()
(from its version 1.1b1 onward) supplies automatically a supplementary "\" to these characters, so that the problem is solved for the macros. Nevertheless, it is a good habit, IMHO, to write macros in this way:
{popText (string (myWord), "Hello!", 20)}
On the other hand, if you type or paste simply these characters in string cells, the second byte "\" are automatically dropped when the cell is updated by hitting the enter key or closing the table window, so that they become corrupt. To see that, you only have to change the font of the table to some Japanese font (e.g. Osaka). For example, if you enter "
" (in one byte translation: "ó\ñÒ") in a string cell, it becomes "
" (in one byte translation: "óñÒ"). To avoid this, you may want first to change the font of the table to a Roman font, type or paste the word, and add manually a "\" to the "\" of the problematic character. For example, for the word "
" (in Roman font: "ó\ñÒ") should be written "ó\\ñÒ"; if you change the font to a Japanese font, it is shown in this way: "
"...
[B]
The patched scripthtml.refGlossary ()
is necessary when you render a script object containing both string literals in Japanese and glossary entries in Japanese. Here is an example (I borrow this snippet from a testing code written by Philip Suh):
html.data.page.title = "Script 1" html.data.page.language = "Japanese" html.data.page.subtext = "
..........." local htmltext = "" s br = "<br>" hr = "<hr>" on add (s) htmltext = htmltext + s
add ("
" + clock.now () + "<br>") add (br + br + "<h3>Subtext is:" + html.data.page.subtext + "</h3>") add ("Raw subtext in a script object:
<br>") add ("a glossary entry: "
"<br>"`) <<glossary entries with "problematic characters" are encoded, and decoded in html.refGlossary ()
To render a script object containing string literals in Japanese, we have to encode all the lines of the script, because the string literals in Japanese could contain characters having "\" or "«" at the second byte, which will cause problems.
MWU.twobyteFixer ()
has a very useful optional parameter which is "excludeQuoted
" -- when it is set totrue
, all the strings enclosed between double-quotes are excluded from the encoding: this is necessary, becausestring.processHTMLMacros ()
(andMWU.NProcessMacros ()
as well) passes all the strings enclosed between double-quotes directly tohtml.refGlossary ()
, without dropping out the "\" in these strings.However, when dealing with the script object containing string literals in Japanese, that optional parameter must be set to false , because these string literals are always enclosed between double-quotes (e.g. the line "
add ("Raw subtext in a script object:
" in the example script quoted above). This means that even the glossary entries are encoded by this process (see in the script<br>")
"user.html.language.Japanese.scriptPageFilter ()
", this line:
str = MWU.TwoByteFixer (str, "«", false, theCode)
).
As the glossary entries are encoded, they must be decoded before being refered to the glossary tables. This is why "the patched script html.refGlossary is necessary when you render a script object containing string literals in Japanese and also glossary entries in Japanese".
[C]
As it is well known, a line in outline objects is limited to 255 characters.. We had to encode outline objects, because some outline renderers (such asuser.html.renderers.newCulture ()
which has this line:
s = string.nthfield (s, «, 1)
) truncate the lines when they encounter the character "«" ("thinking" that what is after this character is comments).
We used to encode the outline objects with
MWU.twobyteFixer ()
, which encodes "«" as "{char(0xC7)}" -- that means that one character was replaced by 11 characters. If a line contains many Japanese characters containing "«", at the second byte, and if that line is already very long (near to the 255 limit), the encoded line was becaming too long...! and we had some inexpected result...This problem is now solved thanks to the new verb introduced in the
MWU
suite, i.eMWU.nthField ()
, which we call fromstring.nthField ()
as well as fromstring.commentDelete ()
andtoys.commentDelete ()
.Anyway, here is the list of the Japanese characters containing "«" at the second byte:
"Translated" in the one byte characters, these characters are:
Ç«, É«, Ü«, à«, â«, ä«, ã«, å«, ç«, é«, è«, ê«, ë«, í«, ì«, î«, ï«, ñ«, ó«, ò«, ô«, ö«, õ«, ú«, ù«, û«, ü«, [dbl dagger]«, ·«, [base ']«, [base "]«, [per thou]«, «, Ê«, Á«, Ë«, È«, Ì«
- But the same problem still remains for the script objects if they contain string literals in double byte code, because they are "outlines" in their nature. The character "«" must be encoded if it is at the second byte of a double byte character, because in the execution of scripts, Frontier interprets it as a sign for comment. On the other, we must add a supplementary "\", if there are string literals in double byte code which contain this character. This is the meaning of the following lines in the script
user.html.language.Japanese.scriptPageFilter ()
:
if str contains "\"" AND str contains "\" str = MWU.TwoByteFixer (str, "«", false, theCode) << new in v. alpha 5 str = MWU.preEvaluate (str, theCode) << new in v. alpha 5 .....
By the second line, the character "«" is encoded as \"«", i.e. one character replaced by 11 characters. And this problem is not solved in the current implementation of our scripts. Although we encode only these characters, the line can exceed the limit of 255 characters. So, in this case, please avoid writing long lines with string literals in Japanese in the script objects !!
[D]
The scriptuser.html.language.getLanguageDirective ()
may cause some problems when dealing with some script objects or embedded scripts in outline objects, if they define or undefine the language directive. Here is an example:
local (language = false) ..... if language == true html.data.page.language = "Japanese" « if "language" is true, set language directive .....
In such a case,
user.html.language.getLanguageDirective ()
will see only the line:
html.data.page.language = "Japanese" « if "language" is true, set language directive
so that the language directive will be set to "Japanese", although the logic of the script itself may not want this. In such a case, you should write the script in this way:
local (language = false) ..... if language == true html.data.page.language = "Japanese" « if "language" is true, set language directive else if defined (html.data.page.language) delete (@html.data.page.language) .....
So, the only directive to remember, that you can defined as "
user.html.prefs
" or in each web site table or each web page, is:#Language "Japanese"
Some technical points...:
- It is important to see if the language directive is defined at the first step of the page rendering process, because
html.runDirective ()
which enters the directives, has at the first line a call totoys.commentDelete ()
. If a directive in Japanese (or any other double byte language) contains any characters having "«" at the second byte, this directive would be truncated, if the script code were not correctly returned by the scriptuser.html.language.scriptCode ()
before that "toys.commentDelete ()
". This is the task ofuser.html.language.getLanguageDirective ()
which callsuser.html.language.getNamedDirective ()
(This is the main place in this script set where the regex suite is used. This is why, we must setregex.options.threadFriendly
totrue
at this point, saving its original value inuser.html.language.regexOptionTemp
, then restoring the original value at the last step, in thefinalFilter ()
).
- Our
user.html.language.Japanese.pageFilter ()
has now only to set tofalse
the value ofuser.html.prefs.includeMetaCharset
and save temporaly its value inuser.html.language.Japanese.tempIincludeMetaCharset
(that value is restored by ouruser.html.language.Japanese.finalFilter ()
.) In fact, the Japanese Mac OS uses S-JIS code, so that the metaCharSet should be defined as "x-sjis", but as there are few browsers that support the charset mata tag, we simply set includeMetaCharset tofalse
.
- On the other hand,
html.data.standardMacros.pageheader ()
refers directly to the value ofuser.html.prefs.includeMetaCharset
to decide if the charset meta tag is to be included (see the lines in this script:
if user.html.prefs.includeMetaCharset local (charset = html.getPref ("charset")) add ("<meta http-equiv="content-type" content="text/html; charset=" + charset + "">r")
.) This is a bug IMHO: we should be able to define in each of our web site tables or web pages the value of includeMetaCharset, and so, the lines above should be:
if html.getPref ("includeMetaCharset") local (charset = html.getPref ("charset")) add ("<meta http-equiv="content-type" content="text/html; charset=" + charset + "">r")
and accordingly,
html.getPref ()
should return a "default" "includeMetaCharset
" value, which isfalse
(cf. our patchedhtml.getPref ()
).
- Our
user.html.language.Japanese.finalFilter ()
restores the values ofuser.html.prefs.includeMetaCharset
andregex.options.threadFriendly
; and if the renderedText still contains "{char(0x..)+char(0x..)}" type of macros, it attempts to evaluate them (without refering to the tool tables, etc.).
- Our script
user.html.language.scriptCode ()
is probably the most important one in this script set. It is somewhat the equivalent ofhtml.getPref ("Language")
, but it supports script objects, and if it fails to get any defined language directive, it returns -1. This is the value of the current System script code.
- Our script
user.html.language.systemScriptCode ()
attempts to get to script code of the System ('itlc' resource), ant put it inuser.systemScriptCode
. But the system script code can be changed dynamically by applications, so it is safer to use -1 as the default script code to pass to the verbs inMWU
suite.
Anyway, if you have any problems using this set of scripts, please write me -- and I could probably fix them.
And anyway, this is still an alpha version -- so please use this script set with caution!Thank you in advance for any bug report, suggestions for improvements, and any feedback!
Nobumi Iyanaga Tokyo, Japan
n-iyanag@ppp.bekkoame.or.jp Wed, Sep 3, 1997 at 12:15:45 by NI
Credits:
The MUW suite has been written by Hideaki Iimori.
Our
user.html.language.getLanguageDirective ()
is a modified version ofdirective.get ()
written by Jan Storms <jstorms@pi.net>.The code of
user.html.language.systemScriptCode ()
has been written by Iimori Hideaki.I have been very helped by Yuichiro Sugiura and Philip Suh in the process of making this suite.
Version History
First release Wed, Sep 3, 1997
- Package with MWU suite (v. 1.1b2) by Iimori-san
Second release Thu, Sep 4, 1997
- Two lines have been added in the script user.html.language.getLanguageDirective (), to avoid a rare error.
Third release Fri, Sep 5, 1997
- Two lines have been modified again in the script
user.html.language.getLanguageDirective ()
, to avoid the same error.user.html.language.scriptCode ()
has been simplified for a faster rendering.
Fourth release Fri, Sep 5, 1997
- Modified again in the script
user.html.language.getLanguageDirective ()
, anduser.html.language.getNamedDirective ()
to avoid more errors.- Added Important Notes: [D] to warn the weakness of the script
user.html.language.getLanguageDirective ()
in some rare cases.
Fifth release Thu, Sep 18, 1997
- Added at the bottom of the script
user.html.language.patchedScripts.buildObject ()
these lines:
I am not sure if this modification is good or needed.if language != true << Added Mon, Sep 8, 1997 by NI user.html.language.lastLanguage = language if defined (html.data.page.language) delete (@html.data.page.language)
- Added
user.html.language.patchedScripts.runDirective2 ()
, which uses the newMWU.preEvaluate ()
verb. This new patched "runDirective ()
" is more general and lets the users put any directives in Japanese (or any double byte code). From the point of view of speed,runDirective2 ()
and the earlier"runDirective ()
" seem to be roughly equal. But pratically, the only directive that can be in double byte code is the title directive. So I would leave the users to choose one or other "runDirective ()
"- Modified some lines in
user.html.language.Japanese.scriptPageFilter ()
. It uses the newMWU.preEvaluate ()
, so that this script encodes now only the character "«"- Re-written some parts of the ReadMe
- Added
people.[user.initials].print ()
script to the package
Return to Frontier Main Page
Mail to Nobumi Iyanaga
This page was last built with Frontier on a Macintosh on Thu, Sep 18, 1997 at 18:26:07. Thanks for checking it out! Nobumi Iyanaga