[Reader-list] NewsRack multilingualized

Subramanya Sastry sastry at cs.wisc.edu
Fri Apr 7 09:34:29 IST 2006


Hello everyone,

After fixing a number of encoding-related problems within NewsRack and
navigating the encoding complexities of web documents, NewsRack can now handle
multiple languages simultaneously.

As an example, check:
http://floss.sarai.net/newsrack/Browse.do?owner=demo&issue=Multilingual-Demo

This profile monitors news from 3 feeds:
-> Hindu National (English)
-> BBC Hindi (Hindi)
-> Le Monde International (French)

For specifying Hindi keywords/concepts, I used the Indic IME extension for
Firefox.  For specifying French keywords/concepts, I cut-paste words from the
newspaper's website, since I dont know how to input French from my
English-text keyboard.

The primary requirement for monitoring newspapers from multiple languages is
that the newspaper encode its text using Unicode fonts (or some other standard
encoding). At this time, for Indian languages, I only know of BBC Hindi that
uses Unicode fonts.  Most newspapers use their own custom encodings
corresponding to their fonts.  I have also been told that there is a Firefox
plugin (Padma) which automatically translates non-Unicode fonts for several
Indian-language sites (Hindi, Tamil, Telugu?) into Unicode fonts.  Using the
code from this plugin, it should be possible to cover all these other Indian
langauge newspapers.

There are still a couple minor problems which will be fixed with time.
Anyway, this is work in progress, but the first step in multi-lingualizing
NewsRack has been accomplished.

Subbu.



More information about the reader-list mailing list