Courtesy of Kelly Sikkema via Unsplash

In a March 29 blog post, OpenAI revealed their AI voice cloner. It’s called Voice Engine, and it can recite a text input in the voice of any person, given a 15-second audio sample. Voice Engine can generate original audio samples; the accent in the audio sample will appear as an accent in audio clips generated in any other language. OpenAI has exhibited AI-generated audio in English, Spanish, Mandarin, German, French, Japanese, Swahili, Sheng, and Portuguese.

Voice Engine can also generate unique human-sounding voices from scratch, without sample audio. Any voice created can be used across languages, enabling people who have always been nonverbal to choose a consistent voice for communication. Early applications have included this, but also human-sounding reading assistance, content translation, and giving a digital voice back to a person who lost their voice, using only a poor quality 15-second recording from a school project.

Voice Engine works by analyzing speech and text data simultaneously, an OpenAI product staff member told TechCrunch. This analysis requires diffusion and a transformer neural network, but it doesn’t necessitate creating a full speech model based on the audio sample.

OpenAI began developing Voice Engine in 2022, and it has already been powering its text-to-speech API, Chat GPT Voice, and Read Aloud. OpenAI has also tested Voice Engine with around 10 companies to minimize risk and maximize societal benefit.

OpenAI wasn’t the first to create and release an AI voice cloner. Voice cloning startup ElevenLabs was founded in 2022 and publicly launched a beta version of their service in January 2023. Many competitors are not far behind.

Having a realistic way to impersonate voices presents obvious risks. In fact, malpractice has already happened. Most prominently, back in January, a Democratic operative sent several robocall phone messages using an AI-generated Joe Biden voice. The message told New Hampshire residents not to vote in the primary. The individual behind the call admitted he used an AI-generated Biden voice without authorization, but said he did it to spotlight the risks of generative AI and promote measures to mitigate these risks. The New Hampshire Attorney General’s Office is investigating what could be a violation of state voter suppression law, and the perpetrator said he may face jail time — but he says he won’t apologize for what he calls an act of civil disobedience.

In OpenAI’s current Voice Engine roll-out to a few limited partners, partners must label all recordings as AI-generated, and OpenAI subtly watermarks these recordings. OpenAI proposes engineering a system ensuring the approval of people whose voices are being cloned by AI, placing an automatic block on cloning famous people’s voices, and ending voice-based authentication. “We hope to start a dialogue on the responsible deployment of synthetic voices, and how society can adapt to these new capabilities,” OpenAI says in their blog post. “Based on these conversations and the results of these small-scale tests, we will make a more informed decision about whether and how to deploy this technology at scale.”

Author

Leave a Reply

Discover more from The Tartan

Subscribe now to keep reading and get access to the full archive.

Continue reading