Meta Releases Open Source AI Machine Translation Model

In a project called "No Language Left Behind," Meta has built an artificial intelligence model — NLLB-200 — that can translate text across 200 different languages. Modelling techniques and learnings from the model will be used to improve and extend translations on Facebook, Instagram and Wikipedia, the company said in a news release.

NLLB-200 was designed with a focus on African languages, for which it can be difficult to find sufficient data to train an AI model. For example, there are 20 million native speakers of Luganda, a language of central Uganda, but "examples of this written language are extremely difficult to find on the internet," Meta explained. "The reality is that a handful of languages dominate the web, so only a fraction of the world can access content and contribute to the web in their own language. We want to change this by creating more inclusive machine translations systems — ones that unlock access to the web for the more than 4 billion people around the world that are currently excluded because they do not speak one of the few languages content is available in."

The company worked with professional translators to help develop a benchmark for automatically assessing NLLB-200's translation quality as well as do a human evaluation of what the AI produced. After measuring the quality of NLLB-200's output in each of the 200 languages, Meta found that it out-performs previous models by an average of 44 percent.

"Africa is a continent with very high linguistic diversity, and language barriers exist day-to-day. We are pleased to announce that 55 African languages will be included in this machine translation research, making it a major breakthrough for our continent," said Balkissa Ide Siddo, public policy director for Africa at Meta, in a statement. "In the future, imagine visiting your favorite Facebook group, coming across a post in Igbo or Luganda, and being able to understand it in your own language with just a click of a button — that's where we hope research like this leads us. Highly accurate translations in more languages could also help to spot harmful content and misinformation, protect election integrity, and curb instances of online sexual exploitation and human trafficking." 

Meta is releasing NLLB-200 as open source as well as publishing research tools for extending the model to more languages and technologies. It also plans to distribute up to $200,000 in grants for nonprofit organizations to develop real-world applications for the model.

A demo using NLLB-200 to translate children's stories from around the world is available here.

About the Author

Rhea Kelly is editor in chief for Campus Technology, THE Journal, and Spaces4Learning. She can be reached at [email protected].

Featured

  • globe surrounded by network connections

    AI Adoption Is Surging, but Infrastructure and Language Gaps Persist

    Artificial intelligence may be spreading faster than previous waves of consumer tech, but a report from Microsoft's AI Economy Institute suggests its benefits are concentrating in a relatively small set of countries, with infrastructure and language emerging as major dividing lines.

  • workshop participants discuss sustainability in open science and research

    Open Source: Advancing Our Digital Commons

    IT leaders are recognizing the benefits of a return to open strategies. CT asked Jack Suess, VP of IT and CIO at UMBC, for his views on returning to the digital commons of open source.

  • college students sitting with laptops at an outdoor table

    How Colleges Are Building More Connected and Responsive Student Support

    Colleges are making steady progress in building more connected and responsive student support systems. By aligning services and improving coordination, institutions are enhancing both the student and staff experience.

  • abstract generative AI technology

    Apple and Google Strike AI Deal to Bring Gemini Models to Siri

    Apple and Google announced they have embarked on a multiyear partnership that will put Google's Gemini models and cloud technology at the core of the next generation of Apple Foundation Models, a move that could help Apple accelerate long-promised upgrades to Siri while handing Google a high-profile distribution win on the iPhone.