Indian Language Computing – Where and How? (article written for i.t. Magazine)
(A draft copy of an article written for i.t. Magazine, published in Nov. 2007. website http://www.itmagz.com)
When I first visited the Electrical Engineering department at IIT Bombay, I was surprised to find out that all the labs in the department also had their Hindi name along with the English name, for instance a “Power Systems Lab” was called “शक्ती प्रणाली प्रयोगशाला”. The question that came to my mind then was “Does it matter?”. Further, when I was talking with a friend of mine about an year ago, that I should be able to write blog in Marathi, my mother tongue, the same question again popped up “Does it matter?”. While in the former case, it probably doesn’t, in the later case it probably does. Being able to express in one’s language is fundamental to communication, but using an alternate name to an existing English name does not necessarily mean much. When someone in the ‘IT magazine’ asked me about “Whether I can write something about ‘Indic Language Computing’?”, a couple of questions popped up again – first was “Why me?”, but since I am writing here already, I believe that issue is settled for now at least. The next question, an important one was ‘What exactly is Indic Language Computing?’ Is computing really related to language at all, for instance arithmetic or algebra isn’t. Without getting into the academic definitions of the words, here is my take – “To me Indic Language Computing means being able to do stuff in Indian Languages on ‘computers’”. Now when one resorts to this rather simplified definition, the question “Does it matter?” becomes irrelevant, it surely does matter.
Now, what I mean by being able to do ‘stuff’ is – Being able to edit documents, send emails, chat with friends, of course write blogs, in short all that I am able to do in English today, I should be able to do most if not all and much more in Indian languages or to be specific in “my language”. Since the definition of computers should also include mobile devices going further, the amount of ‘stuff’ does not just remain restricted to what has been mentioned above. Surely, it is important, but then it is not yet quite “there”. Let’s try to see why it has remained like that so far.
First lets look at “India” as a market place for “Indic Language Computing”. As per Wikipedia, there are about 60 different spoken languages in India, with the highest number of speakers ranging from few hundred millions for languages like Hindi, Bengali, Telugu, Marathi etc. to less than few hundred thousand users for lesser known languages. Moreover, the users are segregated into something what I call language speaking communities. This creates a certain kind of fragmentation in the existing market place. This is further aggravated by extremely high illiteracy and the basic problems of infrastructure. Thus the solutions that can be created effectively has a potential audience of 20 million, give and take few. Juxtapose this with market places like US (200 million plus) or even for that matter Japan or some of the European countries. Thus, though, India appears to be a billion plus market place, it certainly is for ‘soaps and shampoos’ but not necessarily for ‘Indic Language Computing’. This is one major hurdle for ‘investments’ and thus subsequent solutions.
The next set of problems come from the inherent complexity of the Indian languages, be it in terms of rendering the complex characters on computers, creating suitable input mechanisms or storing and retrieving information in these languages. Each of these poses a separate technical challenge, and there is an ongoing work to address these challenges and a lot has been achieved so far. For instance Unicode has provided the standard mechanism of representing the Indian language characters in the ‘language’ computer understands, and it is a widely accepted and implemented standard. There is a good rendering support and availability of Unicode compliant fonts in popular operating systems like Windows and Linux, and thanks to different transliteration schemes, it is possible to have a reasonable input mechanism as well. The challenge here is not ‘availability’ but ‘awareness’ and a ‘well packaged solution’. Looking at the evolution of computing, the Web is the way going forward, will it hold true here as well?
Having said so, I personally think the picture is not so gloomy, in fact quite the contrary as each of these challenges create opportunities and addressing those opportunities might have a significant social as well as economical impact. For instance, if Indic Language Computing actually addresses the problem of education at a very basic level, the literacy can be improved substantially and this can then create a self fueling growth story. Further, there is a large number of what I call “special interest groups”, for instance the people who are interested in literature, music in specficic languages. There is certainly an opportunity to address needs of these special interest groups and this can be done in a preemptive manner like “YouTube” or “Flickr”.
To conclude – here is my final take on Indic Language Computing. Indic Language Computing, as defined above is at a stage where the Satellite Television was in early nineties, predominantly dominated by English channels, but the picture after fifteen odd years is completely 180 degrees out of phase, where the English channels and their viewership are relegated to a small fraction and the major channels are now vernacular. This is not going to be as straight forward for Indic Language Computing, as the needs addressed are completely different, but that appears to be how things will shape up in the coming ten to fifteen years. Atleast I am tempted to think so!
Good article. I like your way of catching all aspects of the issue. I would vote in favor of it. Although it looks complex, but as far as my understanding of computing goes, we can overcome it by developing a universal Language API, which should be able to address smaller issues efficiently.
Harry
November 11, 2008 at 11:29 am