Look Who’s Talking

Voice technology is advancing by leaps and bound – innovation in the space, including multilingual voice applications, is bringing a level of ease in interactions Nowadays, everything has a voice — your phone, car, and even the coffee machine. People use voice technology to shop, send messages, and look for information. And so credible voice […]

Suparna Dutt DCunha February 07, 2022

Topics

Voice technology is advancing by leaps and bound – innovation in the space, including multilingual voice applications, is bringing a level of ease in interactions

Nowadays, everything has a voice — your phone, car, and even the coffee machine. People use voice technology to shop, send messages, and look for information. And so credible voice technology is becoming vital.

The human voice is rhythmic and has intonation that makes speech sound, well, human. Also, in most basic human conversation, we remember, recognise context, and apply the same vocabulary to mean something specific based on a previous conversation.

Using the same possibilities for voice technology is complex, something that tech innovators have not solved until now, but getting there as we speak with smarter conversation. For businesses, interactive AI opens up a world of possibilities.

With Alexa and Google Assistant, Siri and Cortana have brought about a paradigm shift in how humans interact with technology, tech giants have been deploying enormous energy and capital to develop voice technologies.

In 2018 Google IO, when Sundar Pichai, Alphabet CEO, demonstrated Google Duplex, the Google Assistant, making phone calls to set up things like restaurant reservations or salon appointments, it was the best thing we had seen then.

Google Assistant AI was having a “real” conversation — the person on the other end had no idea he was talking to an AI Assistant. Even Pichai’s audience had to be reminded. The Assistant incorporated thoughtful pauses, natural-sounding “mmhmm” into the request.

Since then, innovation in voice technology advanced by leaps and bounds. And tech innovators are trying to bring a level of ease in voice interactions. Imagine adjusting the voice in your car and coffee machine to your preference.

Recently, voice technology company DAISYS achieved a breakthrough in developing human-sounding voices employing artificial intelligence. The innovation, which narrates written texts naturally, generates new, realistically sounding voices. And speech properties like speed and pitch can be adjusted in real-time, customising the voice.

“This is a huge breakthrough. Up until now, naturally sounding voices were always deepfake, based on audio data of professional speakers. With this technology, we are able to create new voices that sound like real people,” said Barnier Geerling, CEO of DAISYS.

“In addition,” Geerling said, “this technology makes it easier and faster to apply speech-steered technology. The market potential is enormous, think of audio-visual media using voice-overs, or ‘talking’ cars, robots, or appliances. For manufacturers, this means the possibility to integrate realistic speech in their products becomes much easier and more efficient.”

Last year, Nvidia launched tools that can capture natural speech qualities. To improve AI voice synthesis, Nvidia’s text-to-speech research team developed a RAD-TTS model that allows individuals to train a text-to-speech model with their voice, including the pacing, tonality, timbre, and more.

And what about communicating in different languages? For multinational organisations, speaking to customers in different languages means going the extra mile to hire multilingual staff and making sure that calls go through to the agent who can speak the caller’s language. PolyAI, a company that builds and deploys voice assistants for automating customer services, creates voice self-service experiences in any language, regardless of accents, to tackle this problem.

It provides enterprise-ready voice assistants, purpose-built for multilingual voice applications that can accurately take down valuable information in any language, such as names and addresses.

Training a multilingual model, which can handle multiple languages simultaneously, is more efficient than training a monolingual model for every language. A Facebook survey suggests that multilingual models perform better than monolingual models, especially for low-resource languages. In particular, XLM-R, a 100-language model introduced by Facebook AI researchers in 2019, achieved state-of-the-art results in cross-lingual transfer and is competitive with English BERT on an English benchmark.

Consumers are increasingly becoming accustomed to more natural-sounding voices, and to keep up with it, enterprises need to ensure that the voice technology they’re using meets the standards.

The go-to technologies are natural language understanding (NLU), conversational artificial intelligence (AI), and neural text-to-speech (TTS). Together, these technologies effectively make voices sound more human-like than those in smart assistants and respond more appropriately to consumer grievances and questions.

For example, an automaker to warn a driver about safety hazards can deploy a more sympathetic voice and a more cheerful voice when communicating a scheduled service reminder at a dealership. With NLU, AI, and TTS, enterprises can create more engaging experiences for consumers. And to sustain interaction both ways, conversational AI must continuously learn how to understand human requests better.

Voice technology is improving since the introduction of Siri in 2011, mainly because recorded voice data has grown exponentially. Until recently, initiating a back-and-forth conversation that encourages relevance, resembles personal opinion, and extends new suggestions was near impossible.

But recently, a conversational search app, MeetKai, has created a useful solution to this problem by building a proprietary voice technology. It trained a digital voice assistant, Kai, to deliver advanced conversation that can comprehend negation, remember the context of queries, just like a human would, and give accurate results without restarting the query from zero.

As voice technology evolves, virtual personal assistants may provide impressively accurate results to each individual, thus becoming more beneficial to each user, offering specific solutions instead of delivering the same generalised answer to all.

Voice search technology will have a great impact on the market for consumers this year and beyond, as the percentage of customers using voice search is only going to increase. It won’t be long before Voice AI will be customised to business challenges, integrated with internal systems like CRM, ERP, and business processes.

If you liked reading this, you might like our other stories

More Marketers are Investing in Digital Voices
Top 10 CX Podcasts: Hit Play

Topics

About the Author

Tags:

Topics

Share