Meta’s open-source AI model leaves no language behind

Meta’s open-source AI model leaves no language behind

We’re excited to bring Transform 2022 back in person on July 19 and pretty much July 20-28. Join AI and data leaders for insightful conversations and exciting networking opportunities. Register today


With every innovation, socially metaverse company, meta, inches closer to fulfilling its mission of “empowering people to build community and bring the world together.” Today, the company announced a research breakthrough in its No Language Left Behind (NLLB) project, designed to deliver high-quality machine translation capabilities for most of the world’s languages.

In the words of Meta’s founder and CEO Mark Zuckerburg, “We just open sourced an AI model that we built that can translate into 200 different languages ​​- many of which are not supported by current translation systems. We call this project No Language Left Behind, and the AI ​​modeling techniques we’ve used, help create high-quality translations for languages ​​spoken by billions of people around the world.”

More languages, less communication

Immediately global digital population of more than five billion people who speak 7,151 languages, it’s no wonder that modern translation systems are in high demand. However, the lack of linguistic data limits the scope of translation technologies that attempt to overcome language barriers in the consumption of digital content. Despite the sophistication of Google’s multilingual neural machine translation offerings, Google translatethe translation possibilities are limited to 133 languages.

Microsoft Bing Translator, another translation tool from one of the world’s largest technology companies, does just over 100 languages. Since more than half of the world’s population speaks alone 23 out of 7,151 world languages ​​that are very normal on the internet, many languages ​​with few resources (especially in Africa and Asia) are not supported in these systems. This indicates a hindered interactive flow between speakers of these languages ​​and the content they want to consume.

AI and translation in the enterprise

Of the many ways artificial intelligence (AI) redefines human interaction and efficiency, translation is one of the most exciting. Machine translation, the manifestation of AI in translation, is a market valued on $800 million as of 2021with an estimated value of $7.5 billion by 2030.

Global Market Insights revealed that the growing need for enterprises to improve the customer experience is a key driver for the growth of the machine translation industry. This is substantiated by: Gartner’s research, showing that translation is a broad enterprise concern, especially as it becomes increasingly relevant in four major synchronous and asynchronous use cases: multimedia (e.g. training and seminars), online sales and customer support (e.g. questions and chatbots), real – time multimedia (meetings, etc.) and documents, texts and segments (eg blogs and product info,).

That’s why enterprises looking to achieve more global reach need inclusive translation solutions that meet the increasingly complex demands of a global consumer base. This is where Meta’s project comes in.

A breakthrough in high-quality machine translation

Launched more than six months ago, the NLLB project is Meta’s ambitious effort to build a universal language translator that can process any language, regardless of the linguistic data available to the AI. Today Meta announced a breakthrough in this project called the NLLB-200 — a single AI model that translates over 200 different languages ​​with state-of-the-art results.

This model supports the high-quality translation of lesser-used languages, especially from Asia and Africa. For example, the model supports translation of 55 African languages ​​with few resources, a 46% increase over what is available with existing translation tools.

Meta claims that for some African and Indian languages, this model improves existing translation systems by more than 70% and also achieves an average 44% increase in the overall Bilingual Assessment Understudy (BLEU) scores across the 10,000 directions of the FLORES-101. benchmark .

Source: Meta

To give an impression of the scaleZuckerburg reveals that “the 200-language model has more than 50 billion parameters, [trained] using [Meta’s] new ones Research SuperCluster (RSC), one of the world’s fastest AI supercomputers. Progress here will enable more than 25 billion translations in our apps every day.”

Despite this breakthrough, Meta realizes that achieving NLLB’s project objectives is not possible without innovative collaboration. To enable other researchers to expand its language reach and build more inclusive technologies, it made the NLLB-200 model open source and also provided grants of up to $200,000 to non-profit organizations to extend the NLLB-200 to their operations. to fit.

The far-reaching implications of this model for the more than 25 billion translations on Meta’s platforms will accelerate better collaborations and community building that defy linguistic and geographic barriers. According to Zuckerburg, “Communicating in different languages ​​is a superpower that AI offers, but as we continue to improve our AI work, it improves everything we do – from showing the most interesting content on Facebook and Instagram, to recommending more relevant ones. ads, to keep our services safe for everyone.”

Wikipedia will also use this technology to translate their media pieces into more than 20 languages ​​with limited resources.

To discover how this model works, start the demo

The mission of VentureBeat is a digital city square for technical decision makers to gain knowledge about transformative business technology and transactions. Learn more about membership.