A study by UIC Barcelona lecturer Maria Fitó highlights the current limitations of generative voice AI

14/02/2025

A study by UIC Barcelona lecturer Maria Fitó highlights the current limitations of generative voice AI

Maria Fitó-Carreras, lecturer in UIC Barcelona’s Faculty of Communication Sciences, has recently published the study Análisis de softwares de inteligencia artificial generativa de voz aplicados al podcasting (Analysis of generative artificial intelligence voice software applied to podcasting) in the scientific journal Comunicación y Hombre. In it, she analyses the main software used by podcast creators for voice cloning.

The academic study highlights the shortcomings that generative artificial intelligence voice software still has in imitating human vocal patterns in professional sectors such as podcasting. Despite the rapid progress of the generative artificial intelligence (GAI) industry, it is not yet seen as an immediate threat due to its inability to reproduce prosody accurately.

The researcher analysed 11 GAI software programmes, divided into three categories: those that generate cloned voices from a user sample, those that function as voice banks, and those that offer a combination of both functionalities. At the same time, the study gathered the opinions of 10 podcast creators who frequently use this technology.

The first conclusion points to a discrepancy between the results claimed by technology companies—who promise hyper-realistic voice generation—and the perception of creators, who believe that GAI voice technology does not yet produce sufficiently realistic results. In fact, they describe listening to AI-generated content as "monotonous and dull" due to the lack of "emotional nuances inherent to the human voice."

One example is the newsletter WP a Day (2032), focused on the world of WordPress and created by Antonio Cambronerot. His podcast is entirely AI-generated, from the script to the voice synthesis, using Amazon Polly software. He acknowledges that the voice "lacks naturalness" and that achieving full automation requires a “laborious” coding process. Nonetheless, he sees the time saved and cost reduction as positive aspects.

Notably, most of the amateur and professional podcasters in the study use generative voice AI as an experiment to explore its potential in the podcasting field. "The technology is not yet sufficiently developed, but in a couple of years, the landscape will be very different," explains Maria Fitó, who is also a radio and voice-over professional.

"Among industry professionals, we are no longer surprised when a company contacts us to redo a job that GAI has failed to execute properly," adds Fitó. "One of the most common complaints among creators featured in the study is the amount of time wasted on cloning and editing voices," she points out. This is the case of the podcast Joe Rogan AI Experience (2023), which uses cloned voices for both the host and guests. The creator admits that the voices "suffer from cadence issues, which means adding filler sounds like ‘ah’ or ‘um’ to simulate a genuine, paced thought process—in other words, to make it sound like a natural human conversation."

While AI continues to advance through major tech companies, in June 2024, the European Union passed the world's first AI Law, aimed at creating a legal framework to "promote reliable AI across Europe and safeguard citizens' fundamental rights." “What about the rest of the world's AI? "Today, few products explicitly state that they are AI-generated”, recalls Maria Fitó. “Many colleagues have already heard their cloned voices being used without their consent, with their voices being sold in software applications in India, for example," she adds.

As a teacher, Fitó also highlights the challenge AI poses in the classroom: “We must find new teaching methodologies so that we don’t become mere evaluators of AI output.” She encourages students to “critically assess and verify information.”

The article’s author also warns of the risks GAI poses for younger generations. Young people will grow up normalising AI-generated voices, and standards will continue to decline,” she warns. Nonetheless, Maria Fitó tries to see the positive side of the emergence of AI: “I hope we will come to value human relationships much more; to be able to talk to real people and listen to them,” she concludes.

Sustainable Development Goals (SDGs)