OpenAI reveals impressive voice cloning model, and it’s scary good – XDA Developers

Posted: March 31, 2024 at 5:50 am

Key Takeaways

Microsoft-backed OpenAI is perhaps best known for ChatGPT, its conversational AI model that made waves back when it launched publicly in 2022, and is still highly impressive to this day. Since then, the firm has also unveiled Sora, an AI model that can generate video clips using just textual input. While Sora is yet to become available publicly, OpenAI has now announced yet another AI model, and this time, it's capable of generating synthetic audio.

The highlight of OpenAI's latest invention is that it can generate realistic synthetic audio using just 15 seconds of sample audio input. It can even generate audio in other languages by mimicking the sound patterns of the original sample. Dubbed Voice Engine, this model is quite small, which makes its audio cloning capabilities all the more impressive.

OpenAI has been working on this project since at least 2022, and it's the technology that powers its text-to-speech API and ChatGPT Voice and Read Aloud. Over on its website, the company has impressive examples where the model has generated extremely realistic audio pieces on various topics by leveraging 15 seconds of sample data on an unrelated topic. You can check those out here.

OpenAI has shared several potential applications of Voice Engine. It can be used to provide reading assistance to non-readers, translate content to reach global audiences, and offer therapeutic services for people who are non-verbal. All the aforementioned scenarios have already been trialed by OpenAI in a private preview conducted with select partners on a small scale.

But perhaps the most interesting part of OpenAI's latest announcement is that the firm isn't ready to release Voice Engine to the public just yet. The reason behind this is potential safety concerns where someone's voice can be cloned without their consent, which is extremely problematic, especially in the U.S. where 2024 is election year. During its private preview with partners, OpenAI ensured that its partners agreed to its usage policies, which included using someone's audio only after the individual's explicit consent, clearly disclose when synthetic audio is being used, and digitally watermarking content generated by the model.

OpenAI will only release Voice Engine once (or if) it reaches an agreement regarding safeguards for the model. Until then, the company has emphasized that the world needs to understand where the technology is headed. For now, it has encouraged banking systems to phase out support for voice detection as a security measure, and requested the community at large to educate itself regarding deceptive AI content, explore policies to safeguard the use of an individual's voice, and implement mechanisms that enable anyone to identify whether a voice is human- or AI-generated.

Here is the original post:

OpenAI reveals impressive voice cloning model, and it's scary good - XDA Developers

Related Posts