Blog

Microsoft’s Speech Translation is still all talk

At an event in China last month, Microsoft’s Rick Rashid unveiled a piece of technology that will likely attract a considerable amount of hype. In front of the company’s Asian 21st Century Computing gathering, the Chief Research Officer showed off speech recognition and automated spoken translation technology, his words being accurately transposed into Mandarin with his vocal tone synthetically carried through to the translated version. From the reaction of observers, the demonstration appeared a success, and the technology raises interesting questions about the possibilities, and the limitations, of automated translations.

 

Much has been made of the voice recognition and emulation side of Rashid’s translation, which is at best an optional enhancement, and in some cases would appear as undesirable excess. It’s exciting, for sure, that a computer can imitate a person’s vocal habits – but it’s not earth-shattering. On the other hand, the suggestion from some quarters that we are now capable, to some degree, of replacing interpretors with computers, is one worthy of serious intrigue.

 

The question we need to ask, though, is how this would ever be possible. You might pin me as naïve, and you’d be half-right, but language factually entails more than a series of algorithms. Consider the relationship between semantics and pragmatics; one concerns itself with somewhat strict meanings and definitions, while the other is wrapped up in the implicit nature of what we say, how we really use language. Which of these is more important? You could certainly argue that each requires the other to act as a balance, but it’s absolutely clear that the way we communicate has more about it than mere dictionary definitions and the frequency of a word in proximity to another.

 

It is common for us to assume that we can build machines capable of anything and everything, but the simple fact is that most of language is conducted on a very human level, in our instinct and the traits we share. For us to understand one another, we need to have a good idea of unspoken context, of the intricacies of a conversation, and of the peculiarity of much of our language. If a computer can do this at all, it cannot do it well. It cannot purposefully soften a verb to keep a diplomatic meeting from boiling over and it cannot understand the in-joke and explain it to a new audience. Those things exist in a different ball park to what we’re currently excited about; the art of professional translation is still as essential as ever.