One does not simply rely upon automated translation: users left wondering what Google are Tolkien about.

Google Translate caused a bit of a stir this week when some temporary irregularities with its Ukrainian into Russian translation functionality were revealed. For a short period of time, the word ‘Russia’ was translated as ‘Mordor’ – the evil region of Middle Earth controlled by Sauron in The Lord of the Rings. Other mistranslations included ‘Occupant’ for ‘Russian’ and ‘Sad little horse’ for Russian diplomat Sergei Lavrov.

Whether you want to look at these mistranslations in isolation or in unison, they are undeniably politically charged. Google claim this to be the work of a bug in some of their algorithms; we consider here in a little more detail how this could be done.

Google Translate functions as a statistical Machine Translation system. It essentially works as a smart dictionary; for any language, a collection of texts (a corpus) is analysed, to determine the frequency with which words in a source language are translated into particular words in a target language. These words are taken firstly in isolation, and then in context, so that a complex web appears; the machine learns both word-for-word and phrase-for-phrase translations, as well as how grammatical features such as prepositions, articles and cases can affect the way a word appears. This quasi-equivalence, whilst highly sophisticated, works better for some language combinations than for others. Ever noticed how Google Translate can, in some instances, turn English into passable French or Spanish, but gives you absolute nonsense for German, Japanese or Turkish? This is because the structural similarities between romance languages enable the statistical equivalence features to work more accurately. German’s different word order and vast variation in article use and adjectival ending agreement, and Japanese and Turkish’s agglutinative structures make it much harder to provide accurate statistical data relevant to a word’s context when translating from English.

So, for two languages with a lot in common such as Ukrainian and Russian, big translation blunders should be few and far between. How then, could these errors occur? Well, Google Translate’s corpora are compiled using both collections of texts from the web and any information inputted into it by users. When you use Google to translate something, it stores the text you’ve written and builds it into the relevant corpus. So, if enough Ukrainian speakers were referring to Russia as Mordor on published websites, this could arguably feature within the corpus. Google also benefits from input from users; when you translate a text, you can add alternative options for a particular word of phrase if you feel a better translation is out there. So again, if enough Ukrainian speakers inputted this suggestion, theoretically it could be possible. This is in part what makes the joke. But it’s plain to see that this couldn’t be an accident. The scale on which these things would have to be occurring, as well as Google’s own common sense safeguarding procedures (I assume they exist) would prevent any such mistranslation; lest some pranksters work to create a more widespread list of mistranslations for a language and unleash automated translation anarchy.

It could be the case then, that someone simply tricked the machine’s algorithms into thinking that the most frequent translation for Russia from the Ukrainian corpus was ‘Mordor’; or that this was the most popular suggestion from contributors, and found a way to validate this. Alternatively, someone on the inside could have been making their political opinions known by allowing the errors to slip through. Google have now fixed the mistranslations, but it would be interesting to know for sure how this happened; we can only speculate, based on our own limited knowledge.

If you could replace a word or phrase in your mother tongue with a silly translation, what would it be?

12 January 2016 11:29