Is Artificial Intelligence the Key to Translating Ancient Languages? - Creative Word

A team from MIT and Google’s AI lab in Mountain View, California, have developed a machine-learning system which they’ve shown is capable of automatically deciphering ancient texts.

According to MIT Technology Review, Jiaming Luo and Regina Barzilay from MIT, and Yuan Cao from Google, have demonstrated the translation machine by using it to decipher a text known as “linear B”, which was discovered in 1886 by the British archaeologist Arthur Evans, and originally deciphered by an amateur linguist named Michael Ventris in 1953.

The approach the team used to automatically translate the text differs from standard machine translation, which works on the basis that words are related to each other, regardless of the language being used.

The concept behind normal machine translation is that all languages are mapped out using huge amounts of data, or text, then these texts are searched to look for word patterns and to see how often each word appears next to every other word.

This pattern of appearances is a unique signature that defines the word in a multidimensional parameter space. The word can be thought of as a vector within this space, and this vector places a control on how the word can appear in any translation the machine comes up with.

The vectors obey mathematical rules such as, king – man + woman = queen, therefore a sentence can be formulated through the knowledge that certain vectors (or words) always follow one another.

The main understanding of machine translation is that words from different languages still follow these same patterns so it is possible to map one language onto another with a one-to-one correspondence.

Machine translation becomes a simple system of finding similar patterns – the machine does not need to understand the words, just the patterns.

Of course, this process requires masses of data, however, a few years ago, German researchers showed how a similar process could work for rarer languages using much less data. The key was to discover an alternate way to limit the machine translation so that it didn’t rely solely on a large database.

The MIT and Google team have taken this concept one step further, using knowledge based on the development of languages over time, to show how machine translation can decipher lost languages.

The Technology Review suggests that “any language can change in only certain ways—for example, the symbols in related languages appear with similar distributions, related words have the same order of characters, and so on. With these rules constraining the machine, it becomes much easier to decipher a language, provided the progenitor language is known“.

The team have trialled their translation technique using two lost languages, Linear B (mentioned above, is an early form of Greek) and Ugaritic (an early from of Hebrew discovered in 1929).

These ‘constraints’, imposed by the evolution of the languages, meant that they were able to correctly translate 67% of Linear B into Greek equivalents.

The possibilities these results could open up for further research into lost languages is huge, and could lead to new linguistic discoveries which aren’t limited by human endeavour.

The next few years are likely to be an interesting time for machine translation and AI developments – watch this space for updates!