Meta’s AI Translation Means No Language Left Behind - Creative Word

Meta’s new AI translation tool aims to drive language inclusivity through the No Language Left Behind (NLLB) project which it is hoped will deliver evaluated, high-quality direct translations between 200 languages, including low-resource languages like Asturian, Luganda, and Lao.

The single AI model, known as NLLB-200, claims to be able to give quality translations across 200 languages and has been evaluated by a new dataset, FLORES-200, which measures the performance of the translation to ensure it is a high standard.

Data from Meta suggests that “NLLB-200 exceeds the previous state of the art by an average of 44 percent.”

For Meta, this project has been created to improve and extend translations on Facebook, Instagram, and Wikipedia so that even users whose native language is relatively rare are able to engage with posts, topics, and conversations in other languages.

It is hoped this will be a lifeline for users who would otherwise have limited engagement with digital content in other languages.

To this end, Meta have focussed NLLB-200 on languages found in Africa and Asia, as many of these languages, such as Kamba and Lao were not supported by existing translation tools.

At present there are 55 African languages supported by NLLB-200 with “high quality” results.

Meta is also open-sourcing the NLLB-200 model to enable other researchers to extend the project to include more languages and advance further inclusive technologies.

There are even grants available of up to $200,000 to non-profit organisations for real-world applications for NLLB-200.

Meta have partnered with the Wikimedia Foundation (the non-profit foundation which hosts Wikipedia) to help improve the quality of language translations on Wikipedia especially for low-resource languages.

The disparity on Wikipedia between a high-resource language such as Swedish which has around 2.5 million articles serving about 10 million speakers, and a low-resource language such as Lingala which has 3,260 articles for more than 45 million speakers, is obvious and has needed addressing for some time but it is hoped that NLLB-200 might shift this imbalance.

The NLLB-200 project is an advance on the M2M-100 translation model which was announced in 2020 which used similar processes for translation.

However, the reach of NLLB-200 is much greater due to further advances in data-sharing for similar languages, regularisation and curriculum learning, self-supervised learning, and diversifying back-translation.

Machine translation tools such as NLLB-200 could lead the way in advancing information sharing around the globe with real-world applications such as, subtitling training and safety videos, translating educational resources and health information, or even automatically subtitling mainstream movies for viewers no matter what language they speak or where they are located.

Watch this space for news on further advances in Machine Translation, and to find out more about how we use MT in professional translations check out our blog page or contact us.