As India’s leading language service provider, we are often asked about machine translation technology and its implications on the industry. “Will machines replace human translators?” “Is it like Google Translate?” “How can we be assured of the quality of machine translated output?” are some of the questions everyone seems to be asking. So who better than Dion Wiggins, CTO of Omniscien Technologies, to talk about MT Technology as a “happy collaboration” between humans and machines?
During one of your talks at the ATC conference in 2017, you had joked about how Machine Translation Technology is and will continue to be the technology of the future. Could you elaborate a little on this?
Well, back in the 50s there was a very bold statement made by a bunch of researchers that claimed machine translation would be a solved problem in 5 years. And every 5 years they added another 5 years and that continues even today. So it has obviously become quite a joke. That doesn’t mean machine translation hasn’t improved. It most certainly has. And it is getting a lot better at a rapid rate in the last few years. The last year has been particularly good for MT with the introduction of Neural Machine Translation.
In many cases, we have outputs where it is very difficult, it at all possible, to tell the difference between translations done by a machine or by a human translator. But that only happens when we train the machine on high quality data that is sufficient and focused. I say focused because language is full of ambiguities. Take the word “run” for instance. You could “do a print run” or “go for a run.” The word is, in fact, the most ambiguous word in the English language with 197 meanings. In such cases, specialised engines for particular domains have been producing translations of good quality.
A lot has been said about how machine translation engines could or could not replace human translators. What is your personal take on this?
I think we are many years away from full replacement. Like I said, a translation engine still needs context because it cannot understand like a human would. Secondly, although it takes away some work from human translators, it gives back many times over in the form of editing and other work. Sometimes, the existence of machine-plus-human frameworks also makes certain deals cost-effective and hence possible. Let’s consider a knowledge base of an IT company that has ten thousand files over 10 years. Human translators would take years to accomplish that and it would also not be affordable. With MTPE, this project suddenly becomes affordable for companies. So although MT takes away the first part of the translator’s job, it helps such big projects go through that wouldn’t have otherwise.
Another example is the work that we are doing with LexisNexis. Engines have translated millions of patents from various languages into let’s say Japanese or English. Now, if someone finds a patent in the database online and wants to take it to court, they would get a human to re-translate or edit it. Without the productivity gain provided by MT, these patents would never have been translated. So I think this is a happy collaboration where humans will always be in the mix.
You had also said that MT can only be as perfect as humans are. And since humans can’t be perfect, what does this mean for Machine translation technology?
Well a better way to put this is that the machine is only as good as the data that it is trained on. So if you train it on low quality data, you should expect a low quality translation. For example, there are translators who specialize in patents. But in the end, no translator can specialize in all the domains of patents. However, when it comes to a machine, it can be trained on a huge quantity of data and a human who is not a specialist in that domain can also edit because the machine gets the terminology right. So, machines are getting better all the time but they are not perfect. Neither are humans. So it’s actually about productivity. The real question is “How fast can a human fix a machine?”
And sometimes neither the machine nor the human editor needs to be perfect. Depending on the content, we have another kind of translation which is called monolingual translation. For hotel reviews, for example, no one cares if the terminology isn’t accurately translated. The meaning needs to be conveyed. So what the travel industry is doing is that they get a human translator to edit the translation without looking at the source. This increases productivity immensely and the translation engine and the translator complement each other. And obviously, something like this won’t happen in life sciences or medical documents.
Machine translation technologies have made significant progress as far as European languages are concerned. How do you foresee the future of MT for all the major Indian languages?
One of the reasons why MT is so advanced in most of the European languages is because their governments have released large bodies of data. That hasn’t happened in India. Given that India is so advanced in IT, it has a huge amount of multilingual data, but MT development is held back by not releasing it for research or public consumption. We have made attempts over 8 to 10 years to get access to that data for research purposes but that hasn’t gone through yet. It is holding India’s MT capabilities back. But if such data were to be made available, you would see massive leaps in MT in India. Europe has a different approach to such data. Since it is public data anyway, they have released it and funded projects for line-by-line translations that can then be used for research. India would do itself huge favours by doing exactly that.
But even without such data, we have a Hindi engine and it is starting to improve. It obviously isn’t where French or German engines are but it is usable for productivity gains. Ultimately, it’s down to data. Just to put things into context, we have a few million sentences for Hindi whereas we have 4 billion of them for German.