[Reader-list] From Kannada to keyboards...

Mon Dec 17 17:13:02 IST 2001

FROM KANNADA TO KEYBOARDS: AN INDIAN LANGUAGE ENTERS THE CYBERAGE

By Frederick Noronha fred at bytesforall.org

For Dr U.B. Pavanaja, an unlucky 1993 scooter accident turned out to be the
proverbial blessing in disguise. For nine months as he lay immobilised in
bed, the scientist learnt Visual Basic.

Laying prostrate on his bed, with a computer alongside, he then went on to
write the first versions of what is now his 'Kannada Kali' software
programme. This is a game that helps a child or new learner of the Kannada
language of the Southern Indian state of Karnataka to shape his alphabets
properly.

"I did it lying on the bed with a computer by my side," he recalls with a
smile. Over the years, as he stepped up work on the issue of Indian regional
language computing, the one-time scientist at India's prestigious atomic
research centre finds his output increasingly relevant to the commonman.

Currently he's at the helm of the Kannada Ganaka Parishat (or, Kannada
Computer Association). This is a voluntary organisation formed by
computer professionals, literary persons and others to promote the
standardisation and usage of the Kannada language on computers.

It's probably important not to underestimate the size of this task.

Kannada is the language of some 47 million people worldwide -- more than the
number of Polish speakers in the globe, and just below the number of
Ukrainian speakers. Besides, the lessons learnt with Kannada could have
important implications for other prominent Indian languages whose speakers
number in millions. For instance, Hindi (496 million), Bengali (215
million), Urdu (106 million), Punjabi (96 million), Telugu and Tamil (75
million each), and Marathi (72 million).

"There is so much talk about computing for the commonman. But the main
problem that everyone seems to overlook is that the commonman (specially in
countries like India) speaks in languages other than English," as Dr
Ubaradka Bellippady Pavanaja reminds us. (Both his first names are
village-names, and in the South Indian style, are generally not spelt out in
full.)

So, for the past many years, he's been working sweating over this front.
Some solutions are simple, why-didn't-we-think-of-it-earlier ways out.
Others are attempts to do the groundwork and undertake standardisation that
could have far-reaching implications for the future.

So far, the standardisation has already been done, both on a uniform
keyboard for Kannada, and also for the glyphs and glyph-codes. (The latter
refer to the component parts that, when joined together in varying
combinations, make up each alphabet.)

There's a big difference between English and Indian-languages over the
display and storage of information in computers. In the case of English,
there is a one-to-one correspondence between the display codes and the
storage codes. But in the case of an Indian language, say Kannada, the
letters are made up of combinations of consonants and vowels. Using, for
example, a consonant-plus-consonant-plus-consonant-plus-vowel combination.

These characters have a unique storage code in ISCII, or the Indian
Standards Code for Information Interchange. Display of these characters are
accomplished by joining pieces of characters known as 'glyphs'. Codes for
the storage characters and the display pieces (glyphs) are different.

In addition, the number of characters which make the make the character
(used for storage) and the number of display pieces which are used for the
display of the letter simply don't have a one-to-one correspondence.

An example: the Kannada language uses some 142 pieces to obtain all the
possible combinations that can be obtained from the based 49 Kannada
alphabets.

In the past, Indian groups working on language-solutions -- like the
Pune-based government backed C-DAC and Mithi, which specialises in local
language computing, also from Pune -- have worked on similar work. But in
earlier cases, everyone followed their own glyph sets.

This meant data lacked 'portability'. Text composed on one computer could
not be carried over, or understood by, another computer which did not share
the same software. This was a great handicap in a world where the ability of
computers to 'talk to one another' has made them into the powerful tool they
currently are.

"We feel the best solution is to have the storage in ISCII. Other solutions
have attempted to tie up the user in their own software solutions," says Dr
Pavanaja.

He says that the Government of India's stand is that ISCII should have
standardised glyph sets. "In our region, the Government of Karnataka has
standardised glyph sets already. We have benchmark software too... to ensure
that the software would work with any standard computer." Admits Dr
Pavanaja: "Standardisation is something that has to be imposed (for the sake
of moving ahead together)."

At another level, the Kannada language has also pushed for what it calls the
Kannada Standard Code for Language Processing. This is used for sorting, as
per the Kannada order of alphabets.

"Sorting is a very important job for computers. Can youthink of a single
database operation without sorting and indexing?" asks Dr Pavanaja. "For all
these years, using computers for Kannada-work meant simply using it for
typing, making books, printing invites and DTP (desktop publishing) work. It
has now changed," points out Dr Pavanaja. Sorting and indexing in the
regional language, he argues, has opened up new possibilities.

C-DAC (the Government of India-backed Centre for Development of Advanced
Computing) earlier had solutions, but this, he says, was not particularly
suitable for the Kannada language. This attempt evolved a national standard
based on Hindi, whereas every language of India has its own specialities and
requirements.

At another level, the Parishad has been working towards a standardised
Unicode for Kannada. "KGP general secretary Srinatha Sastry and myself put
together a document, and sent it to the Unicode Consortium. It was partly
accepted," says Dr Pavanaja. He underlines the importance of uniformity for
the Unicode character table and collation code for this regional language.

Incidentally, India's voting-member at Unicode Consortium is the Indian
government's Ministry of Information Technology (MIT). But lack of uniform
interests among the various Indian languages used for computing means that
sometimes not much can be done on this front.

In September 2000, Dr Pavanaja took part in a Unicode conference in
California. "We explained the issues (involved in Kannada), and that was
appreciated a lot. The MIT is waiting for all languages to come up with a
decision. Only Kannada has done this much groundwork on Unicode. At least
Kannada could be implemented on Unicode for now (instead of waiting for all
Indian languages to finish their task)."

Besides, the Parishad has developed a free Kannada script software. This was
released in October 2001 in Bangalore.

"It has got SDK (the software development kit) as part of it. But most
importantly, it comes free (in terms of price)," stresses Dr Pavanaja. He
suggests that this is important too in a price-sensitive region like India,
where millions still live in poverty.

Using this, developers can write Kannada database applications. It could,
therefore, have applications linked to phone directories, ration cards,
banking, libraries and even road-transportation operations. This spells
immense fallouts for this large state of Karnataka, which has a population
roughly the size of South Africa, and over half the area of Germany in
land-mass.

"Everyone needs good database applications. In Indian language computing,
90% of the uses are linked to DTP unfortunately. But in English, computers
are overwhelmingly used for database applications," says he, stressing that
the lack of applications also causes problems.

Whether it's e-commerce, business transactions or public utlities and
governance, all these sectors need good database applications, stresses Dr
Pavanaja.

One of this team's solution is called 'Kalitha'. It is a Kannada keyboard
driver and font. "It also has a sorting engine, not just a sorting-facility.
This is the first time that any Indian language had this facility," says Dr
Pavanaja.

This group led by Srinatha Sastry, has modified a Kannada keyboard-layout
originated by K.P. Rao. It uses the 26 English-language keys for Kannada's
49 alphabets. "Even Bill Gates appreciated (the concept behind) such a
layout for a keyboard," says Dr Pavanaja.

But just how does it work? The 'shift' (or 'caps') key comes to the rescue.
"English has 26 alphabets multiplied by two (with each using the caps key).
This makes a total of 52. In Kannada, we need only a total of 49. It works
well with the 'shift' and 'unshift' key," says he. This layout has been
accepted and notified by the Karnataka government.

In order to keep things simple for the typist and computer-operator,
this keyboard makes things a "little more difficult" for the
programmer. But once that is taken care of, things become simple in
actually using this solution.

Besides his technical work, this man's own story is also interesting.

Dr Pavanaja, currently 42, is a PhD in chemistry. He was a scientist at the
Bhabha Atomic Research Centre (BARC) in Bombay. "We used computers
extensively, in lab-automation and we also experimented in connecting a lot
of lab equipment to computers," he recalls.

Using computers "as a tool" for his scientific work for awhile, he says he
"got addicted". His own efforts took the chemical scientists closer to the
computer in the early days of the PCs.

"I soon became seen as a computer professional," he recalls of times in the
mid-eighties, when the PC first began to make its appearance in the Indian
scientific establishments.

In BARC, a group to promote the Kannada language often faced difficulties in
publishing technical articles in its Kannada-language science magazine. That
set him thinking. "While doing our magazine 'Belagu' (whose name loosely
translated to 'Shine' or 'Reflect Light'), we decided to buy our own DTP
package."

In 1995, a visit for advanced research to Taiwan revealed that computer
professionals were heavily into computer use, but were overwhelmingly using
Chinese. "If they could use their language, why not we?" thought Dr
Pavanaja.

Soon, he became active on Internet 'news' groups like
soc.culture.indian.karnataka and also set up websites. What happened
afterwards is narrated in terms of the output achieved and listed above.

"When I was a scientist, I felt my doctorate had no use. I was hardly doing
any (socially-relevant) work. Now, I don't feel guilty about that anymore,"
he says. He returned from Taiwan in 1996 and resigned from BARC in June
1997.

In 1998, his work made Kannada one of the first Indian languages to
use dynamic fonts. He explains: "Earlier, if you wanted to browse a
web-site, you needed the (same font used by the site) to be installed
on your PC."

Obviously, a real dilemma in a region where there exist dozens or
hundreds of non-standardised fonts for each language. This meant
downloading the font. You needed to do it each time you used a different
computer!

Dynamic fonts solve the problem by residing on the 'server', not on the
'client' (or user's computer). When you browse a site, you automatically
pull the font info the first time you browse it. Also, it works with any
operating system you're using, Dr Pavanaja points out.

"In English, you don't have the problem of clashing glyphs. If you use
a fancy font, you can still read it at least in Times or Arial...," He
notes.

Pavanaja has also createD a Kannada version of LOGO. "LOGO stands for
'logic-oriented, graphic-oriented' programming. It is a language for
children. It uses very simple commands, like 'forward', 'backward', and so
on. School children of the fifth to eight standards (roughly 10 to 13 years
of age) can use it effectively. I thought of Kannada-medium schools, and
wanted something for them," says Dr Pavanaja.

Work done by this group could make Kannada the first Indian langauge to get
onto a palm-top computing device, believes Dr Pavana. "Much of the coding
(for some of our projects) has been done by K.M.Harsha, a 22-year-old
mechanical diploma holder from a village," he points out. This, says the
scientist, only underlines the creativity of youngsters if given the chance.
It challenges the myth that city-born children are more intelligent!

One of the KGP's dreams is to have Kannada working with the 'free' and 'open
source' Linux operating system, which was largely build up by volunteers
worldwide. "But that could take some time," concedes Dr Pavanaja. "We need
to have keyboard drivers, fonts, a toolkit for software developers, a free
office suite like Star Office, and even the complete Linux working in
Kannada," he adds. Getting legal copies of proprietorial software would cost
millions for a state the size of Karnataka.

"So far the KGP has been taking its funding from the government,
semi-government institutions, corporate world and philanthrophy. We need to
develop software and make it available freely (so as to make it affordable
to the commonman in a country where millions still live in poverty). We
don't sell anything," says Dr Pavanaja.

Says Dr Pavanaja: "If you don't put Indian languages into the computer, all
our tongues will get relegated to being just spoken languages in five to ten
years time."

Currently the editor of 'Vishva Kannada', which he terms the world's first
Internet magazine in the Kannada language, Dr Pavanaja can be contacted at
<pavanaja at vishvakannada.com> This magazine's site can be visited on the
World Wide Web at www.vishvakannada.com (ENDS)