[Reader-list] Monthly report

surekha at servelots.com surekha at servelots.com
Sat Jun 26 11:43:57 IST 2004


Hello all,

This is my third monthly posting for the project titled 
"Multilingual support for web applications using server-side java".

Surekha.

***************************************************************************

    Multi-lingual support for Web applications using Server Side Java.
    ------------------------------------------------------------------

A collaborative work of Surekha Sastry and K.Srinivasa Raghavan.

Input Method Editor (IME):
--------------------------
We have been working on making the IME available on the web browser
rather than on a web page. This will allow the user to type in any
Indian language and in any of the web pages. E.g., if I would like to type
my email in Kannada in my "yahoo" mail, then I can just do that by
invoking the IME, which is a link on the browser. The IME can be made
available on any browser by the concept of "Bookmarklets".

Bookmarklets (http://www.bookmarklets.com) are small, reusable JavaScript
routines that you can save on your computer in your browser's Bookmark
(or Favorites) section. Hence the name bookmarklet. Bookmarklets work on
all platforms and there's no special software to download. All you need is
a JavaScript-enabled browser. There is a limit to the number of
characters a bookmarklet can contain. This limit again differs between
browser versions. For example, Internet Explorer 6 can handle up to 508
characters.

Since our IME (Input Method Editor), implemented in JavaScript, exceeds the
character limit of a bookmarklet, we resolved the problem using the
following procedure.

1. The html page (languageSelection.html), which brings up a drop down list
of Indian languages and the JavaScript file (IndicInput_IME.js)
containing the IME logic, are kept in a web server. The html page
(languageSelection.html) refers to this JavaScript file
(IndicInput_IME.js).

2. The bookmarklet is just a link to this html page.
This is how a bookmarklet looks.

----------------------------------------------------------------------------
<html><body><a
DEFANGED_href="javascript:(function(){window.open('http://localhost/languageSelection
.html','mywindow','HEIGHT=50,WIDTH=50,resizable=1');})();">IME
Bookmarklet</a></body></html>
--------------------------------------------------------------------------

Just drag this link to your favorites section or bookmarks to store it as
a bookmarklet.

3. When user clicks on this bookmarklet on any web page, it will bring up
   a drop down list of Indian languages. When a user selects a language, the
   JavaScript code is executed and the user will be able to type in
   his/her language.

Technically, when the user clicks on the bookmarklet, the html page
(languageSelection.html) pops up in a new window and downloads the
javascript (IndicInput_IME.js). Now, this javascript will capture
'KeyPress' events on the web page being viewed. We were successful in
creating a bookmarklet which works in a local web server.

If the web page is from a different web server (say yahoo mail), then the
browser does not allow to capture events because of the security
reasons. This problem can be solved by signing the javascript file
(IndicInput_IME.js). To do this, we need a certificate from an authorized
Certificate Authority (like Verisign) or generate a self-signed certificate
for testing purposes.

1. We tried generating a test certificate using Netscape Sign Tool 1.3
using the following command:

signtool -G myCertificate -k myCertificate -d NetscapeCertificates/

where myCertificate --> nickname of certificate
and NetscapeCertificates --> directory containing netscape databases
(cert7.db and key3.db)

2. We then imported the certificate to the browser.

3. We signed the html page (languageSelection.html) and the javascript
file (IndicInput_IME.js) with the generated certificate.

To sign the files, we used the following command of the
Netscape Sign Tool 1.3:

signtool -d <DEFANGED_directory in which generated certificates are stored>
-k <DEFANGED_certificate nickname> -Z <DEFANGED_jar file name> <directory, which
contains files to be signed>

E.g.: signtool -d NetscapeCertificates -k myCert -Z myScript.jar
myScripts

This command creates the jar file (myScript.jar) and signs all the files
present in the JAR with the certificate "myCertificate".

4. The bookmarklet is changed to refer to this JAR.
-------------------------------------------------------------------------
<html><body><a
href="javascript:(function(){window.open('jar:http://<DEFANGED_webServer>[:port]/mySc
ript.jar!/languageSelection.html','mywindow','HEIGHT=50,WIDTH=50,resizable=1
');})();">IME Bookmarklet</a></body></html>
----------------------------------------------------------------------------

The <DEFANGED_SCRIPT> tag of the languageSelection.html file is changed to:

<script archive="http://<DEFANGED_webServer>[:port]/myScript.jar id="loadScript"
src="jar:http://<DEFANGED_webServer>[:port]/myScript.jar!/IndicInput_IME.js"
language="Javascript"></DEFANGED_script>

We have added the following lines to the javascript (IndicInput_IME.js)
file

try {
  netscape.security.PrivilegeManager.enablePrivilege("UniversalBrowserWrite"
)
;
  enableExternalCapture();
}
catch(err) {
  document.write(err);
}

Problem:
-------
When we try to invoke the bookmarklet from a different web server, then
we get an error message "Exception: Component returned failure code:
0x80004005(NS_ERROR_FAILURE)[nsIDOMJSWindow.enableExternalCapture]".

Any help/suggestions are welcome.


Data Storage:
------------

Multi-lingual converter from ISCII-UNICODE-UTF8 is ready and is released
in public domain <http://sarovar.org/projects/codeconverters/>. We will
soon keep a demo page of the converters.


Display Issues:
--------------

We have analyzed the problems in using non-unicode fonts for displaying
Indian Language characters in the browser.

Problems in using non-unicode fonts:
------------------------------------

1. In case of non-unicode fonts like shusha, each character may not be
formed by a single glyph instead it may be combinination of several
glyphs. Also, the cursor positioning is improper with these fonts.

2. We need to develop conversion (lookup from keyboard keys to
corresponding font glyph) in Javascript for IME. The conversion functions
will again be different for different Indian Language fonts. But in case of
unicode, the code pages for Indian Languages are placed at an offset of
128. So, by merely manipulating the offset, we get the appropriate Indian
Language character, except in some Indian Languages like Tamil where the
number of characters are relatively less.

3. Data sorting at the display end will give strange results since the
sorting becomes glyph based rather than character based.

4. In case of non-unicode fonts, we'll have to write the logic for display
engine, for rendering the vowel signs and conjuncts. In case of
Devanagari, there are about 12,000 combination of conjuncts with
matras. The number of cases may differ from script to script.

Again, the display engine will be different for different non-unicode
fonts for the same Indian language.
For example, the display engine designed for Shusha font will not work
with DV-TTYogesh font for Devanagari script.

So, considering the above points, we feel it is not a good idea working
on IME with non-unicode fonts. Any suggestions/comments are welcome.

Instead the proposed solution would be to develop a rendering engine for
Netscape and Mozilla, that would properly display complex scripts with Open
Type Fonts (OTF). Internet Explorer uses "Uniscribe"(rendering engine
developed by Microsoft) to render complex scripts for OTF. For more
information about uniscribe, please refer to
"http://bhashaindia.com/knowledge/glyph/uniscribe.aspx .

--------------------------------------------------------------------
mail2web - Check your email from the web at
http://mail2web.com/ .





More information about the reader-list mailing list