Tech

Google Translate now speaks Sepedi and Tsonga – but new technology means ‘it isn’t perfect’

Business Insider SA
Translation results may not be as good as for European languages, Google warns.
Translation results may not be as good as for European languages, Google warns.

  • Google Translate has added Sepedi and Tsonga to the languages it can handle.
  • The service already features Afrikaans, Sesotho, isiXhosa, and isiZulu.
  • The new set of languages, 24 in all, were added using a new machine-learning technique, Google says.
  • That means results may not always be as good as for the big European languages.
  • For more stories go to www.BusinessInsider.co.za.

Google Translate can now translate to and from Sepedi and Tsonga, it announced this week, though not yet as well as it should.

Sepedi and Tsonga are among 24 new languages added in "a technical milestone", using Zero-Shot Machine Translation. That technique makes it possible to build a translation engine for languages that do not have translation dictionaries or large amounts of translated text available. Instead it can learn a language using relatively small amounts of monolingual text from what is euphemistically referred to as "under-resourced languages".

That comes at a price.

"While this technology is impressive, it isn't perfect," warned Google Translate senior software engineer Isaac Caswell. "And we’ll keep improving these models to deliver the same experience you’re used to with a Spanish or German translation, for example."

In a more detailed post on the artificial intelligence approach to adding the new languages, Caswell and colleagues were more blunt.

"[W]e want to stress that the quality of translations produced by these models still lags far behind that of the higher-resource languages supported by Google Translate. These models are certainly a useful first tool for understanding content in under-resourced languages, but they will make mistakes and exhibit their own biases."

Google Translate has existing support for Afrikaans, Sesotho, isiXhosa, and isiZulu. It also features a number of languages from the region and the continent, including Shona and Swahili. But many others are missing, and monolingual model-building currently seems the best bet for their inclusion.

Despite the continent being home to some 2,000 languages, it is barely represented in the AI field of natural language process (NLP), says the grassroots Masakhane project, which is working to change that.

"The tragic past of colonialism has been devastating for African languages in terms of their support, preservation and integration. This has resulted in technological space that does not understand our names, our cultures, our places, our history."

Get the best of our site emailed to you every weekday.

Go to the Business Insider front page for more stories.