Adobe Turns Up the Volume on AI With New Ways to Generate Soundtracks and Audio

Adobe's hub for all things AI, Firefly, is central to its latest innovations. The company announced a ton of AI-powered updates at its Max creative conference on Tuesday. While the rest of us have been obsessing (and worrying) over OpenAI's new Sora AI slop app, Adobe is headed in a different direction: Its newest features are for generating AI audio.

Adobe was the second big tech company to introduce AI-generated audio to its AI video model, following Google's Veo 3. Its previous AI audio tool was primarily focused on sound effects. With that tool, you could record yourself roaring like a monster, and AI would keep the cadence of your recording but beef it up with AI. Now, Adobe is building on its audio tools and introducing new ones.

Generate soundtrack and generate speech do exactly what they suggest: You can create background music and record scripts for your video. But each comes with industry-first perks that make them enticing for any creator. They're available in beta now.

Adobe is also releasing its latest, fifth-generation Firefly Image Model. It's better at producing photorealistic images, and you can now use prompt-based editing. There's also a new Firefly video editor, a multitrack timeline that's meant to help you manage AI-generated clips. Adobe is expanding its partnerships with two new AI companies, ElevenLabs and Topaz Labs. And with Adobe, you'll also be able to create your own custom AI models. For even more AI news, you can learn about the AI assistants coming to Photoshop and Express.

Generating speech

Generating speech in Firefly is simple, and it includes a lot of features that'll make it useful for nearly any project. It's a simple window where you can type in the words you want the AI voice to read. You can also upload a script of up to 7,500 characters -- roughly a 15- to 20-minute video. Once uploaded, you can choose from 50 voices, each tagged with an approximate age and gender, including nonbinary options. You can generate speech in 20 different languages. But the fun part is what you can do to fine-tune your prompt.

Speech is more than just reading words on a page. When we read long passages or talk with others, we naturally add emphasis, emotion and rhythm to our speech. With the new program, you can do the same, adding pauses where you want the AI to take a breather and highlighting sections where the tone should shift.

If you're like me and nobody pronounces your name right on the first try, you can use the "fix pronunciation" tool to ensure there aren't any flubs. Select the name or proper noun and then add a phonetic breakdown, and the AI will use that to smooth out the pronunciation.

These tools, along with your hands-on ability to adjust specific sections, are meant to give you more control, something other text-to-speech programs don't always offer.

"It's a way for us to provide lifelike speech to creators, to small business owners, to educators, to everybody that really just has a story to tell, and maybe they're not as comfortable as we are just pulling out a mic and talking," Jay LeBoeuf, Adobe's head of AI audio, said in an interview.

... continue reading