BITS and Pieces

Get global. Get ahead.

Raiomand Doctor

Technical advisor for Natural Language Processing

He is the exemplary connoisseur of languages and computational linguistics. The ardent teacher of his time, he is considered to be the ultimate feather in the cap of the students who have had the privilege to train under him. He was recently presented a ‘Special Recognition Award’ on 30 September on the occasion of International Translation Day, to acknowledge a lifetime of dedication to the world of languages and its various possibilities. Lo and behold, this month BnP talks to the legend himself – Raiomand Doctor, about his zeal for machines and languages, his passion for his work and how he never stops learning.

How did you first step into the world of languages?

My dad used to teach French so I was automatically drawn to it. Having specialised in Linguistics, I took care of the French department at Fergusson College for a while initially, then moved to the University. I then became the Head of the Foreign Languages Department at the erstwhile University of Pune (now SSPU). This was from 1979 until 2005. In 2005, I took a premature retirement and wanted to focus more on what I’m currently doing at the Centre for Development of Advanced Computing, (C-DAC).

Could you please tell us more about your work at C-DAC?

Established in 1987, this centre is a national mission that basically gives Indian languages a digital platform. Even though I started working with them in 1985, its formal initiation was not until 1987 and back then, I was working on and off with them, since I was also working in France. I was, what they call Chargé de recherché, in charge of the research at Collège de France. That’s where I worked a lot with Old Iranian and wrote all my books and papers, owing to the very peaceful environment, good access to libraries and great people to work with.

So when I returned, I decided to move over to C-DAC full time and since then, I’ve been working extensively on the framework of the policy for Indian Languages. My field is basically called Natural Language Processing. We also started Unicode for Indian languages and have also been working closely with ISO for neo-Brahmni scripts as well as Urdu. Moreover, we work on standards a lot, since standards are what make the machines stick (and talk!). And unless everything is perfectly in a standard format, two machines will not be able to understand or talk to each other. You then cannot do natural language processing of any language unless its standards are compliant.

I notice how you always use “we” and not “I”. So you really like working with your team, I’m guessing?

Absolutely! And I say this because under no circumstances is it a one man’s job. There’s always team involvement and teamwork is extremely crucial in any software industry. There is neither any room for politics nor people with huge ego issues, nor can you have anyone saying “I am old hence I know more than you”. I believe everything is always a learning experience, something I have always maintained in my classrooms as well. I never claimed to know everything. My students were always free to raise their hand and tell me that I’m wrong and I would accept it readily, provided there was a good discussion that followed.

The same applies in this case too. I feel I always have a lot to learn from my colleagues since they are trained in newer technologies. So I have to make that extra effort and learn all this and forget the old. Because learning is a constant experience.

So you mentioned that you also advise ICANN (Internet Corporation for Assigned Names and Numbers) regarding Indian URLs. Could you elaborate on that please?

We work closely with the ICANN for names in Indic scripts and yes, we advise ICANN regarding URLs in Indian scripts. That is actually one of the greatest achievements so far. Most Indian URLs normally end in .gov, .mil, .org, or a standard .in. Today you can have your URL ending in .bharat

That’s a really good balance of the IT world AND the language world! So where exactly does the processing come in?

Apart from all this, we also work a lot with data analytics, which basically refers to a lot of raw data that is available on the net and our job is to ensure that all this raw data is properly configured such that it is easily accessible to a machine. So it uses something known as a document summarisation. The priority, of course, is Hindi. We also have other language experts here such as for Tamil, Telegu, Kannada, etc. who are not translators, but language experts or rather, linguists working in that area. I mainly handle Hindi, Marathi, Gujarati, Urdu and recently also started working with Sindhi. So that broadly covers what I do, one of my interests being the application of big data analytics and localization to Indian languages.

And could you explain ‘Natural language processing’?

It is basically the interfacing of machines and humans. Our main job here is localization. We do it on an extremely large scale even though I personally don’t work with it that closely. Today in villages you are able to have all your land records, birth and death records, even e-governance can be done in one’s own mother tongue and it is indeed possible to have it in Marathi. So we are tied up with the ministry of information and communication technology that funds us to carry out such sensitive work. Apart from that, we also work on translating sign language to text, video subtitling and broadly anything that you could possibly require, in relation with Indian languages – we try to provide everything that a user may require so that he/she can work comfortably in his/her mother tongue.

You told us earlier that you lectured exclusively in Europe. Was that purely for languages?

I have two areas of specialisation. Europe works a lot with Sanskrit and Indian languages. Hence there are often problems because people don’t know how to handle an Indian language software or even work with that language. So I was called to work in that matter, and I gave lectures and talks about the structure of Indian languages in addition to their scope and usage. I also gave a talk in Cambridge about transferring Old Iranian avasthas to the digital platfrom – how you could put the entire set of religious texts on the web and make it digitised. You can then search and read and what not.

Apart from that, I was also called on and off, to talk about Math which mainly encompassed String Theory and Chaos Maths, and their application to Linguistic theories. I was also an adviser to the prestigious library BULAC in Paris that adopted the ISO for their library content as well as their meta-data. Since a lot of people can’t read Indian languages, you have to provide them with a transliteration and advise them what standards and norms can be used in relation to Indian languages.

Our readers would love to know how many languages you speak!

Honestly speaking, I am comfortable with two or three languages only. And those would be English, Gujarati which is my mother tongue and French. Maybe I’d even add German to that list. Apart from that, I could get along in quite a few, but that is not in any way, “knowing” the language in its true sense. To really know a language would mean to be entirely at ease with that language and be able to function with it at all levels.

I work with about 10 languages in my field – 3 Perso-Arabic, Kashmiri, Hindi, Urdu, Marathi, Gujarati of course, Konkani, Nepali and could manage a bit of Bangla and Punjabi as well. So I could read the texts and manage a decent level of these languages, even guide people to a certain extent. But it would be stupidity to say that I am a language expert where these languages are concerned!

Do you have any regrets or an alternate path of any kind that you may have chosen in your younger days, if not for this one?

(laughs) My only regret is that I came into this world of digitalised revolution a bit late. I was a bit old and had to learn everything from scratch. A real tragedy. I had to learn programming on my own. Fortunately I had some great colleagues here. I joined as a language expert initially but you tend to feel the need to write your own programs, so that’s when I started learning how to program.

Your dedication and progress with machines is commendable. However on a social level, would you prefer machines or people?

(Grins) Machines. For the simple reason that machines are logical. I might sound like the character Sheldon from the show The Big Bang Theory, but they are extremely logical indeed. There is no room for illogicity where a machine is concerned. (laughs) Once you’re hooked on to a machine and if it does something wrong, you can correct it, but if something goes wrong with a human being, the repercussions are quite vast. Hence I prefer working with machines. That’s why I also like keeping to myself. Actually not even so much to myself, but rather to machines!

Back to the Main Page of this month’s issue >>

Alifya Thingna - Associate Director | Key Accounts Having grown up around the Middle East and India, Alifya is a shy, yet friendly and colourful personality with a keen interest in human psychology, ethnology and contemporary dance forms. An aesthete by nature, she is extremely passionate about getting to know new people, immersing herself in new cultures, writing and doing the 'little things' that make this world a better place to live in. She also has a Masters degree in French literature, enjoys biking and is the modern definition of a logophile and an equalist.