Ask HN: What is the state of OSS voice cloning?
I am super impressed by the quality of voice cloning offered by Eleven Labs and Play.ai. I feel like I see impressive OSS demos on social frequently, but last weekend I took a few popular ones for a spin and quality wasn't even close to the proprietary models.

https://github.com/coqui-ai/tts https://github.com/serp-ai/bark-with-voice-clone https://github.com/metavoiceio/metavoice-src https://github.com/myshell-ai/OpenVoice https://github.com/collabora/WhisperSpeech https://github.com/neonbjb/tortoise-tts

Has anyone else had success with these? Are there other projects I should look at?

After spending a bit more time with these models, I wrote up my findings in more detail if anyone is interested in learning more.

https://www.ddmckinnon.com/2024/10/03/dans-weekly-ai-speech-...

This is a bit different. These audio clips use the default voice of each of these systems. I was asking about zero-shot voice cloning, i.e. transferring a recorded voice and synthesizing speech in that voice.

I tried zero-shot voice cloning in all of the top OSS models in the Arena and performance was bad.

Most of those models DO do zero shot cloning. The best is VoiceCraft. It's nearly 11Labs quality. Check it out.
Thanks for the flag. VoiceCraft is indeed the best ZS OSS voice cloning tool, despite appearing at the bottom of the TTS arena They have a really easy-to-use gradio demo on their repo if anyone else wants to give it a try.

There is still a big gap between 11Labs and Character.ai and the VoiceCraft voices would not be confused for the real speaker, but this is much closer.