Elyse Betters Picaro / ZDNET
ZDNET's key takeaways
There are now several AI tools available that can generate humanlike speech.
Some AI voices can now whisper, laugh, and perform other expressive feats.
TTS tools vary in terms of their level of realism and their intended audiences.
Synthetic voices generated by artificial intelligence are, for better or worse, becoming commonplace. Meanwhile, the number of companies developing this technology is growing rapidly.
Recent innovations in AI, such as the transformer architecture -- which forms the backbone of many generative AI tools, including large language models, generative adversarial networks (GANs), and diffusion models -- have led to the rise of AI systems that can convert text prompts into natural-sounding artificial speech. There are now a wide variety of these text-to-speech (TTS) systems available, each with its particular benefits and shortcomings.
To gain a clearer sense of which are the most advanced, I tested three of the most popular free TTS tools currently on the market.
ElevenLabs
ElevenLabs is widely considered an industry leader in voice realism, and I found this to be a reasonably accurate assessment in my own experiments with the company's TTS tool. But that realism feels more closely aligned with the voice of a trained voice actor or professional podcaster than it does with ordinary human conversation -- it's almost a little too polished. In that sense, however, it tends to be the preferred choice for many businesses and professionals looking for reliable automated narration. It also supports more than 20 languages, further expanding the platform's reach and appeal.
... continue reading