Crowdsourcing and machine translation: the start of a beautiful friendship

Machine-aided translation is one of those things people love to hate. Despite the best efforts of enthusiasts like myself, the majority of computer users still believe that machines are useless translators.

The whole area of machine translation has a terrible image problem. There are endless jokes and “true” stories about computer translation failures. Some of these are very funny (like the machine that apparently translated the English saying “out of sight, out of mind” into “invisible idiot” in Russian). However with a little crowdsourcing help, I suspect the machines may have the last laugh.

In defense of machines
Sometimes I almost feel there is a conspiracy against computers. Take the entry on Machine-aided translation in the Finnish version of Wikipedia. Far from being neutral, the article’s author seems determined to rubbish all machine translation. He offers readers a Finnish translation of the following English text, done using Google’s translator:

“William Shakespeare (baptized 26 April 1564 – 23 April 1616) was an English poet and playwright, widely regarded as the greatest writer in the English language and the world’s preeminent dramatist.”

“William Shakespeare (kastettu 26 huhtikuu 1564 – 23 huhtikuu 1616) oli Englanti runoilija ja näytelmäkirjailija, laajalti pidetään suurin kirjailija, että Englanti kielen ja maailman preeminent näytelmäkirjailija.”

For readers who do not understand Finnish let me explain: this is a lousy translation. Of course it is! The Wikipedia author wanted the translation to be lousy just so he could prove his point. People like this guy treat computers like electronic slaves. Instead of learning how machine translation actually works, they just bash the keys then yell “I told you so” when the machine (quite understandably) fails to deliver.

I prefer to view computers as partners and collaborators. In this spirit, I politely ask Google Translate (I call her GT) to do the work she is best at, and help her with the rest.

When I collaborate with GT, I first convert the Finnish text into what I call Googlish: a simplified version of the Finnish language which GT understands well. The variant of Googlish I use is one I have constructed specially for translating from Finnish into English.

Softly, I whisper “please GT, translate the Finnish national song “Maamme” (Our Land). I will convert the Finnish lyrics into Googlish, then you do the translation, and finally I’ll brush-up your text a little bit.”

Here is an extract from our result:

Our country is poor and will remain so,
if it’s gold you want.
A stranger walks by us proud,
but this is the land we love,
its forests, its mountains and its reefs,
they to us are dear.

Dear GT, thank you. It is an honor to be your collaborator and friend!

Inviting the crowd
I’m sure you can see the “crowdsourcing potential” of this human/computer approach. I just ask a native Finnish crowd to do the pre-editing phase (Finnish to Googlish) then, post-translation, I ask a native English-speaking crowd to do the final brush-up. These two crowds can certainly do work much faster and cheaper than I can (and probably also considerably better).

Language is a skill that took us humans hundreds of thousands of years to develop. Given that computers have only been “evolving” for a few decades, their language skills are really very impressive. I’m convinced that machine-aided translation has enormous potential to help people understand and communicate better. Just as long as we also learn to understand and communicate a bit better with our machines…

*         *         *         *         *

Guest post written by Dr. Hannu I. Miettinen, and previously published on the Microtask blog: