With Koe Recast, you can change your voice as easily as your clothing


A colorful waveform that actually has nothing to do with Koe: Recast.
Enlarge / A colorful waveform dramatically swirls through latent space, seeking kawaii.

reader comments
67 with 50 posters participating, including story author

Thanks to a web demo of a new AI tool called Koe Recast, you can transform up to 20 seconds of your voice into different styles, including an anime character, a deep male narrator, an ASMR whisper, and more. It’s an eye-opening preview of a potential commercial product currently undergoing private alpha testing.

Koe Recast emerged recently from a Texas-based developer named Asara Near, who is working independently to develop a desktop app with the aim of allowing people to change their voices in real time through other apps like Zoom and Discord. “My goal is to help people express themselves in any way that makes them happier,” said Near in a brief interview with Ars.

Several demos on the Koe website show altered clips of Mark Zuckerberg talking about augmented reality with a female voice, a deep male narrator voice, and a high-pitched anime voice, all powered by Recast.

This kind of realistic AI-powered voice transformation technology isn’t new. Google made waves with similar tech in 2018, and audio deepfakes of celebrities have caused controversy for several years now. But seeing this capability in an independent startup funded by one person—”I’ve funded this project entirely by myself thus far,” Near said—shows how far AI vocal synthesis tech has come and perhaps hints at how close voice transformation might be to widespread adoption through a low-cost or open source release.

Stable Diffusion by putting realistic audio deepfakes into the hands of many without hard restrictions. “We’re exploring some monetization strategies,” Near said. “If the profit models I have in mind don’t work out, open-sourcing this technology may be an option in the future.”

As deep learning technology continues to peel away the 20th century concept (or some might say “illusion”) of media as a fixed and accurate record of reality, we are looking at a near-future in which digital representations of a living human’s voice, much like images and video, will be one more thing you can’t take at face value without significant trust in the source. Still, the technology could empower many people who might otherwise be discriminated against while doing business—or simply having fun—online.

Article Tags:
Article Categories:
Technology