OpenAI reveals impressive voice cloning model, and it’s scary good

Key Takeaways

  • OpenAI’s Voice Engine can create realistic synthetic audio (including in other languages) from just 15 seconds of sample audio.
  • Potential benefits of Voice Engine include reading assistance, translation for global reach, and therapy for non-verbal individuals.
  • Despite the technology’s capabilities, OpenAI is holding off on releasing Voice Engine due to concerns around privacy and consent issues.



Microsoft-backed OpenAI is perhaps best known for ChatGPT, its conversational AI model that made waves back when it launched publicly in 2022, and is still highly impressive to this day. Since then, the firm has also unveiled Sora, an AI model that can generate video clips using just textual input. While Sora is yet to become available publicly, OpenAI has now announced yet another AI model, and this time, it’s capable of generating synthetic audio.


What’s special about OpenAI’s latest model?

The highlight of OpenAI’s latest invention is that it can generate realistic synthetic audio using just 15 seconds of sample audio input. It can even generate audio in other languages by mimicking the sound patterns of the original sample. Dubbed Voice Engine, this model is quite small, which makes its audio cloning capabilities all the more impressive.


OpenAI has been working on this project since at least 2022, and it’s the technology that powers its text-to-speech API and ChatGPT Voice and Read Aloud. Over on its website, the company has impressive examples where the model has generated extremely realistic audio pieces on various topics by leveraging 15 seconds of sample data on an unrelated topic. You can check those out here.

What are the potential benefits of Voice Engine?

Sam Altman sitting at a TechCrunch event


OpenAI has shared several potential applications of Voice Engine. It can be used to provide reading assistance to non-readers, translate content to reach global audiences, and offer therapeutic services for people who are non-verbal. All the aforementioned scenarios have already been trialed by OpenAI in a private preview conducted with select partners on a small scale.

When is OpenAI releasing Voice Engine?

But perhaps the most interesting part of OpenAI’s latest announcement is that the firm isn’t ready to release Voice Engine to the public just yet. The reason behind this is potential safety concerns where someone’s voice can be cloned without their consent, which is extremely problematic, especially in the U.S. where 2024 is election year. During its private preview with partners, OpenAI ensured that its partners agreed to its usage policies, which included using someone’s audio only after the individual’s explicit consent, clearly disclose when synthetic audio is being used, and digitally watermarking content generated by the model.


OpenAI will only release Voice Engine once (or if) it reaches an agreement regarding safeguards for the model. Until then, the company has emphasized that the world needs to understand where the technology is headed. For now, it has encouraged banking systems to phase out support for voice detection as a security measure, and requested the community at large to educate itself regarding deceptive AI content, explore policies to safeguard the use of an individual’s voice, and implement mechanisms that enable anyone to identify whether a voice is human- or AI-generated.

[ad_2]

Related posts