Meta Releases Open Source AI Machine Translation Model

In a project called "No Language Left Behind," Meta has built an artificial intelligence model — NLLB-200 — that can translate text across 200 different languages. Modelling techniques and learnings from the model will be used to improve and extend translations on Facebook, Instagram and Wikipedia, the company said in a news release.

NLLB-200 was designed with a focus on African languages, for which it can be difficult to find sufficient data to train an AI model. For example, there are 20 million native speakers of Luganda, a language of central Uganda, but "examples of this written language are extremely difficult to find on the internet," Meta explained. "The reality is that a handful of languages dominate the web, so only a fraction of the world can access content and contribute to the web in their own language. We want to change this by creating more inclusive machine translations systems — ones that unlock access to the web for the more than 4 billion people around the world that are currently excluded because they do not speak one of the few languages content is available in."

The company worked with professional translators to help develop a benchmark for automatically assessing NLLB-200's translation quality as well as do a human evaluation of what the AI produced. After measuring the quality of NLLB-200's output in each of the 200 languages, Meta found that it out-performs previous models by an average of 44 percent.

"Africa is a continent with very high linguistic diversity, and language barriers exist day-to-day. We are pleased to announce that 55 African languages will be included in this machine translation research, making it a major breakthrough for our continent," said Balkissa Ide Siddo, public policy director for Africa at Meta, in a statement. "In the future, imagine visiting your favorite Facebook group, coming across a post in Igbo or Luganda, and being able to understand it in your own language with just a click of a button — that's where we hope research like this leads us. Highly accurate translations in more languages could also help to spot harmful content and misinformation, protect election integrity, and curb instances of online sexual exploitation and human trafficking." 

Meta is releasing NLLB-200 as open source as well as publishing research tools for extending the model to more languages and technologies. It also plans to distribute up to $200,000 in grants for nonprofit organizations to develop real-world applications for the model.

A demo using NLLB-200 to translate children's stories from around the world is available here.

About the Author

Rhea Kelly is editor in chief for Campus Technology, THE Journal, and Spaces4Learning. She can be reached at [email protected].

Featured