Are there any good open source text-to-speech tools?

Question

There are paid services that offer this (e.g. resemble.ai), and a few colab notebooks that I haven't found very helpful, but I wanted to know whether anyone here has had any luck with free text to speech (tts, t2s) tools. Thank you!

woodlander87 · Accepted Answer

I've been playing with mimic 3 from Mycroft lately. It's pretty usable out of the box and is self hostable. https://mycroft.ai/mimic-3/

mijoharas · Answer

I actually looked into this a month or so ago, what I was looking for was just reasonable sounding simple tts driven by the cli (so heavyweight things were out, as were most things with a local server, though I looked at some I think).

I ended up going with pico-tts[0]. I remember looking at a few other things and left myself the following comment:

  # checked out mimic as well. Didn't seem great, espeak is like nails on a chalkboard
  # haven't checked out marytts or larynx or anything, but this is good enough™

[0] https://github.com/Iiridayn/pico-tts

LarryMullins · Answer

I'm not sure about the licensing of all the models/etc, but Coqui AI's 'TTS' python package is fairly good.https://github.com/coqui-ai/TTS

cahoot_bird · Answer

It's such an obvious answer perhaps is why nobody has commented it. But depending on the use, you might try web speech API synthesis. For example a Windows user might see a Cortana option whereas a Mac user might see Siri.
Demo Here: https://mdn.github.io/dom-examples/web-speech-api/speak-easy...
Read more here https://github.com/mdn/dom-examples/tree/main/web-speech-api

082349872349872 · Answer

I've had good luck with https://github.com/espeak-ng/espeak-ng (for very specific non-english purposes, and I was willing to wrangle IPA)

sampo · Answer

Mimic3 from the Mycroft project https://github.com/MycroftAI/mimic3

n8henrie · Answer

I have a ton of fun using the "say" program on MacOS to write toy programs with my kids, have often wanted a version that could run on my eldest's Manjaro laptop. Are any of the above analogously simple to use?

jslakro · Answer

A handy way with python https://pyttsx3.readthedocs.io/en/latest/engine.html

cloverlake · Answer

I've had good results with larynx: https://github.com/rhasspy/larynx

smcameron · Answer

pico2wave with the -l=en-GB flag to get the British lady voice is not too bad for offline free TTS. You can hear it in this video: https://www.youtube.com/watch?v=tfcme7maygw&t=45s

mozman · Answer

I&rsquo;m interested in the opposite: I want to transcribe meetings at work because my memory and note taking are inadequate.I&rsquo;m familiar with things like otter.ai but I am not risking my job by sharing data with something I don&rsquo;t control.

windthrown · Answer

I have heard good things about Mozilla's TTS: https://github.com/mozilla/TTS

inoffensivename · Answer

does festival count as good these days? https://www.cstr.ed.ac.uk/projects/festival/it's the only one I have any experience with

richardfeynman · Answer

This doesn't answer the question, but I thought it might be relevant to mention here that I've been using chatGPT + resemble.ai to create what I believe is the first kid's stories podcast created entirely by AI.
Here's how it works:
- Kid requests a story about a, b, c on www.makedupstories.com
- chatGPT generates the text for a story, summary, and title
- we send this to resemble.ai (sounds like Tortoise TTS would work just as well), which has a clone of my voice
- the audio file then gets sent to anchor.fm
you can listen to example episodes here on Spotify: https://open.spotify.com/show/6liL4T3kJf1scHq134s0mJ
And here on Apple podcasts: https://podcasts.apple.com/us/podcast/kidscast-kids-stories-...

thom · Answer

Bonus points for models that work well offline on mobile devices.

mindcrime · Answer

Depends on how you define "good". Espeak-ng, for example, works just fine as such. But the quality of the freely available voices is nothing close to the Siri / Google Assistant / Alexa / whatever standard. Understandable? Yes. Usable? Yes. But "good"? Mmmmm... YMMV.

TylerLives · Answer

I have no recommendations, but I'm curious if someone has tried to train a TTS on the data made by one of the commercial services. Generating data would be very cheap, labels perfect, and there would be less noise than in the human datasets.

tetmin · Answer

We just released & open sourced this as a UI & API: https://tts.themetavoice.xyz/It's free up to $30 & then cost price after that. It's exceptionally realistic, but can take a bit of time to synthesise as a result.

amelius · Answer

Papers-with-code would be the first place to look:https://paperswithcode.com/task/text-to-speech-synthesis

var_cw · Answer

Founder of dubverse.ai here. As someone who has done production level TTS(deployed on India's largest news network) I can say there is alot of room for improvement in terms of intelligibility. Most of these open source toolkits/models offer only a certain quality of TTS which is IMO good to play around with but damn too tough to make it sound studio-quality

geenat · Answer

https://github.com/gnat/text-to-speech-ubuntu

mmcwilliams · Answer

If your use case allows for a web API, I've had good experience running OpenTTS[0].
It packages several models, including Coqui AI's TTS which I tend to use the most. There's a handy Docker image, too.
[0] https://github.com/synesthesiam/opentts

culi · Answer

There's... the web platform. No really, there's a SpeechSynthesis API:https://developer.mozilla.org/en-US/docs/Web/API/SpeechSynth...

blacklight · Answer

I've been quite happy with Mimic3 lately (https://github.com/MycroftAI/mimic3), the engine that powers Mycroft. It also comes with an easy-to-install Docker image.

vram22 · Answer

Old, but may be of interest:Speech synthesis in Python with pyttsxhttps://jugad2.blogspot.com/2014/03/speech-synthesis-in-pyth...

friend_and_foe · Answer

Yeah, I have used espeak, flite tts, RH voice, and a couple of others and they work very well.

dopidopHN · Answer

Coqui is a open source text to speech solution.I haven&rsquo;t used it in a while but I seen a lot a of new feature listed over the last year or so.Give it a tryhttps://github.com/coqui-ai/TTS

okokwhatever · Answer

Funny how "everybody" is working into the same ChatGPT projects right now (Speech to Text, API integration, TTS...)Somehow it's a nice to observe this trend to start working in other areas.

nmfisher · Answer

Not sure if you&rsquo;re looking to train your own model or just run inference on pretrained models, but if it&rsquo;s the former, you can find espnet, TensorflowTTS and coqui on GitHub.

brianshaler · Answer

I've have pretty good luck with flowtron after watching an nvidia screencast on it. CPU only inference performance isn't great though.

IceHegel · Answer

When I last compared (about a year ago) Google was the best of the commercial solutions. Is that still the case?

metiscus · Answer

CoquiAi seems very good from the work I've done with it.

tslmy · Answer

Given a URL, this service return an audio file / stream (in WAV format) that reads out the main content of the webpage.https://github.com/tslmy/tts

infinite8s · Answer

Are there any that use TensorflowLite?

theCrowing · Answer

The best is probably tortoise but you have to run it yourself https://github.com/neonbjb/tortoise-ttshere are some demos https://nonint.com/static/tortoise_v2_examples.html

Are there any good open source text-to-speech tools?

There are paid services that offer this (e.g. resemble.ai), and a few colab notebooks that I haven't found very helpful, but I wanted to know whether anyone here has had any luck with free text to speech (tts, t2s) tools. Thank you!

I've been playing with mimic 3 from Mycroft lately. It's pretty usable out of the box and is self hostable. https://mycroft.ai/mimic-3/

I'm not sure about the licensing of all the models/etc, but Coqui AI's 'TTS' python package is fairly good.
https://github.com/coqui-ai/TTS

I've had good luck with https://github.com/espeak-ng/espeak-ng (for very specific non-english purposes, and I was willing to wrangle IPA)

Mimic3 from the Mycroft project https://github.com/MycroftAI/mimic3

I have a ton of fun using the "say" program on MacOS to write toy programs with my kids, have often wanted a version that could run on my eldest's Manjaro laptop. Are any of the above analogously simple to use?

A handy way with python https://pyttsx3.readthedocs.io/en/latest/engine.html

I've had good results with larynx: https://github.com/rhasspy/larynx

pico2wave with the -l=en-GB flag to get the British lady voice is not too bad for offline free TTS. You can hear it in this video: https://www.youtube.com/watch?v=tfcme7maygw&t=45s

I’m interested in the opposite: I want to transcribe meetings at work because my memory and note taking are inadequate.
I’m familiar with things like otter.ai but I am not risking my job by sharing data with something I don’t control.

I have heard good things about Mozilla's TTS: https://github.com/mozilla/TTS

does festival count as good these days? https://www.cstr.ed.ac.uk/projects/festival/
it's the only one I have any experience with

Bonus points for models that work well offline on mobile devices.

Depends on how you define "good". Espeak-ng, for example, works just fine as such. But the quality of the freely available voices is nothing close to the Siri / Google Assistant / Alexa / whatever standard. Understandable? Yes. Usable? Yes. But "good"? Mmmmm... YMMV.

I have no recommendations, but I'm curious if someone has tried to train a TTS on the data made by one of the commercial services. Generating data would be very cheap, labels perfect, and there would be less noise than in the human datasets.

We just released & open sourced this as a UI & API: https://tts.themetavoice.xyz/
It's free up to $30 & then cost price after that. It's exceptionally realistic, but can take a bit of time to synthesise as a result.

Papers-with-code would be the first place to look:
https://paperswithcode.com/task/text-to-speech-synthesis

https://github.com/gnat/text-to-speech-ubuntu

If your use case allows for a web API, I've had good experience running OpenTTS[0].
It packages several models, including Coqui AI's TTS which I tend to use the most. There's a handy Docker image, too.
[0] https://github.com/synesthesiam/opentts

There's... the web platform. No really, there's a SpeechSynthesis API:
https://developer.mozilla.org/en-US/docs/Web/API/SpeechSynth...

I've been quite happy with Mimic3 lately (https://github.com/MycroftAI/mimic3), the engine that powers Mycroft. It also comes with an easy-to-install Docker image.

Old, but may be of interest:
Speech synthesis in Python with pyttsx
https://jugad2.blogspot.com/2014/03/speech-synthesis-in-pyth...

Yeah, I have used espeak, flite tts, RH voice, and a couple of others and they work very well.

Coqui is a open source text to speech solution.
I haven’t used it in a while but I seen a lot a of new feature listed over the last year or so.
Give it a try
https://github.com/coqui-ai/TTS

Funny how "everybody" is working into the same ChatGPT projects right now (Speech to Text, API integration, TTS...)
Somehow it's a nice to observe this trend to start working in other areas.

Not sure if you’re looking to train your own model or just run inference on pretrained models, but if it’s the former, you can find espnet, TensorflowTTS and coqui on GitHub.

I've have pretty good luck with flowtron after watching an nvidia screencast on it. CPU only inference performance isn't great though.

When I last compared (about a year ago) Google was the best of the commercial solutions. Is that still the case?

CoquiAi seems very good from the work I've done with it.

Given a URL, this service return an audio file / stream (in WAV format) that reads out the main content of the webpage.
https://github.com/tslmy/tts

Are there any that use TensorflowLite?

The best is probably tortoise but you have to run it yourself https://github.com/neonbjb/tortoise-tts
here are some demos https://nonint.com/static/tortoise_v2_examples.html

Are there any good open source text-to-speech tools?

There are paid services that offer this (e.g. resemble.ai), and a few colab notebooks that I haven't found very helpful, but I wanted to know whether anyone here has had any luck with free text to speech (tts, t2s) tools. Thank you!

I've been playing with mimic 3 from Mycroft lately. It's pretty usable out of the box and is self hostable. https://mycroft.ai/mimic-3/

I'm not sure about the licensing of all the models/etc, but Coqui AI's 'TTS' python package is fairly good.https://github.com/coqui-ai/TTS

I've had good luck with https://github.com/espeak-ng/espeak-ng (for very specific non-english purposes, and I was willing to wrangle IPA)

Mimic3 from the Mycroft project https://github.com/MycroftAI/mimic3

I have a ton of fun using the "say" program on MacOS to write toy programs with my kids, have often wanted a version that could run on my eldest's Manjaro laptop. Are any of the above analogously simple to use?

A handy way with python https://pyttsx3.readthedocs.io/en/latest/engine.html

I've had good results with larynx: https://github.com/rhasspy/larynx

pico2wave with the -l=en-GB flag to get the British lady voice is not too bad for offline free TTS. You can hear it in this video: https://www.youtube.com/watch?v=tfcme7maygw&t=45s

I’m interested in the opposite: I want to transcribe meetings at work because my memory and note taking are inadequate.I’m familiar with things like otter.ai but I am not risking my job by sharing data with something I don’t control.

I have heard good things about Mozilla's TTS: https://github.com/mozilla/TTS

does festival count as good these days? https://www.cstr.ed.ac.uk/projects/festival/it's the only one I have any experience with

Bonus points for models that work well offline on mobile devices.

Depends on how you define "good". Espeak-ng, for example, works just fine as such. But the quality of the freely available voices is nothing close to the Siri / Google Assistant / Alexa / whatever standard. Understandable? Yes. Usable? Yes. But "good"? Mmmmm... YMMV.

I have no recommendations, but I'm curious if someone has tried to train a TTS on the data made by one of the commercial services. Generating data would be very cheap, labels perfect, and there would be less noise than in the human datasets.

We just released & open sourced this as a UI & API: https://tts.themetavoice.xyz/It's free up to $30 & then cost price after that. It's exceptionally realistic, but can take a bit of time to synthesise as a result.

Papers-with-code would be the first place to look:https://paperswithcode.com/task/text-to-speech-synthesis

https://github.com/gnat/text-to-speech-ubuntu

If your use case allows for a web API, I've had good experience running OpenTTS[0].It packages several models, including Coqui AI's TTS which I tend to use the most. There's a handy Docker image, too.[0] https://github.com/synesthesiam/opentts

There's... the web platform. No really, there's a SpeechSynthesis API:https://developer.mozilla.org/en-US/docs/Web/API/SpeechSynth...

I've been quite happy with Mimic3 lately (https://github.com/MycroftAI/mimic3), the engine that powers Mycroft. It also comes with an easy-to-install Docker image.

Old, but may be of interest:Speech synthesis in Python with pyttsxhttps://jugad2.blogspot.com/2014/03/speech-synthesis-in-pyth...

Yeah, I have used espeak, flite tts, RH voice, and a couple of others and they work very well.

Coqui is a open source text to speech solution.I haven’t used it in a while but I seen a lot a of new feature listed over the last year or so.Give it a tryhttps://github.com/coqui-ai/TTS

Funny how "everybody" is working into the same ChatGPT projects right now (Speech to Text, API integration, TTS...)Somehow it's a nice to observe this trend to start working in other areas.

Not sure if you’re looking to train your own model or just run inference on pretrained models, but if it’s the former, you can find espnet, TensorflowTTS and coqui on GitHub.

I've have pretty good luck with flowtron after watching an nvidia screencast on it. CPU only inference performance isn't great though.

When I last compared (about a year ago) Google was the best of the commercial solutions. Is that still the case?

CoquiAi seems very good from the work I've done with it.

Given a URL, this service return an audio file / stream (in WAV format) that reads out the main content of the webpage.https://github.com/tslmy/tts

Are there any that use TensorflowLite?

The best is probably tortoise but you have to run it yourself https://github.com/neonbjb/tortoise-ttshere are some demos https://nonint.com/static/tortoise_v2_examples.html

I'm not sure about the licensing of all the models/etc, but Coqui AI's 'TTS' python package is fairly good.
https://github.com/coqui-ai/TTS

I’m interested in the opposite: I want to transcribe meetings at work because my memory and note taking are inadequate.
I’m familiar with things like otter.ai but I am not risking my job by sharing data with something I don’t control.

does festival count as good these days? https://www.cstr.ed.ac.uk/projects/festival/
it's the only one I have any experience with

We just released & open sourced this as a UI & API: https://tts.themetavoice.xyz/
It's free up to $30 & then cost price after that. It's exceptionally realistic, but can take a bit of time to synthesise as a result.

Papers-with-code would be the first place to look:
https://paperswithcode.com/task/text-to-speech-synthesis

If your use case allows for a web API, I've had good experience running OpenTTS[0].
It packages several models, including Coqui AI's TTS which I tend to use the most. There's a handy Docker image, too.
[0] https://github.com/synesthesiam/opentts

There's... the web platform. No really, there's a SpeechSynthesis API:
https://developer.mozilla.org/en-US/docs/Web/API/SpeechSynth...

Old, but may be of interest:
Speech synthesis in Python with pyttsx
https://jugad2.blogspot.com/2014/03/speech-synthesis-in-pyth...

Coqui is a open source text to speech solution.
I haven’t used it in a while but I seen a lot a of new feature listed over the last year or so.
Give it a try
https://github.com/coqui-ai/TTS

Funny how "everybody" is working into the same ChatGPT projects right now (Speech to Text, API integration, TTS...)
Somehow it's a nice to observe this trend to start working in other areas.

Given a URL, this service return an audio file / stream (in WAV format) that reads out the main content of the webpage.
https://github.com/tslmy/tts

The best is probably tortoise but you have to run it yourself https://github.com/neonbjb/tortoise-tts
here are some demos https://nonint.com/static/tortoise_v2_examples.html