HACKER Q&A
📣 fumblebee

Are there any good open source text-to-speech tools?


There are paid services that offer this (e.g. resemble.ai), and a few colab notebooks that I haven't found very helpful, but I wanted to know whether anyone here has had any luck with free text to speech (tts, t2s) tools. Thank you!


  👤 woodlander87 Accepted Answer ✓
I've been playing with mimic 3 from Mycroft lately. It's pretty usable out of the box and is self hostable. https://mycroft.ai/mimic-3/

👤 mijoharas
I actually looked into this a month or so ago, what I was looking for was just reasonable sounding simple tts driven by the cli (so heavyweight things were out, as were most things with a local server, though I looked at some I think).

I ended up going with pico-tts[0]. I remember looking at a few other things and left myself the following comment:

  # checked out mimic as well. Didn't seem great, espeak is like nails on a chalkboard
  # haven't checked out marytts or larynx or anything, but this is good enough™
[0] https://github.com/Iiridayn/pico-tts

👤 LarryMullins
I'm not sure about the licensing of all the models/etc, but Coqui AI's 'TTS' python package is fairly good.

https://github.com/coqui-ai/TTS


👤 cahoot_bird
It's such an obvious answer perhaps is why nobody has commented it. But depending on the use, you might try web speech API synthesis. For example a Windows user might see a Cortana option whereas a Mac user might see Siri.

Demo Here: https://mdn.github.io/dom-examples/web-speech-api/speak-easy...

Read more here https://github.com/mdn/dom-examples/tree/main/web-speech-api


👤 082349872349872
I've had good luck with https://github.com/espeak-ng/espeak-ng (for very specific non-english purposes, and I was willing to wrangle IPA)

👤 sampo
Mimic3 from the Mycroft project https://github.com/MycroftAI/mimic3

👤 n8henrie
I have a ton of fun using the "say" program on MacOS to write toy programs with my kids, have often wanted a version that could run on my eldest's Manjaro laptop. Are any of the above analogously simple to use?

👤 jslakro

👤 cloverlake
I've had good results with larynx: https://github.com/rhasspy/larynx

👤 smcameron
pico2wave with the -l=en-GB flag to get the British lady voice is not too bad for offline free TTS. You can hear it in this video: https://www.youtube.com/watch?v=tfcme7maygw&t=45s

👤 mozman
I’m interested in the opposite: I want to transcribe meetings at work because my memory and note taking are inadequate.

I’m familiar with things like otter.ai but I am not risking my job by sharing data with something I don’t control.


👤 windthrown
I have heard good things about Mozilla's TTS: https://github.com/mozilla/TTS

👤 inoffensivename
does festival count as good these days? https://www.cstr.ed.ac.uk/projects/festival/

it's the only one I have any experience with


👤 richardfeynman
This doesn't answer the question, but I thought it might be relevant to mention here that I've been using chatGPT + resemble.ai to create what I believe is the first kid's stories podcast created entirely by AI.

Here's how it works:

- Kid requests a story about a, b, c on www.makedupstories.com

- chatGPT generates the text for a story, summary, and title

- we send this to resemble.ai (sounds like Tortoise TTS would work just as well), which has a clone of my voice

- the audio file then gets sent to anchor.fm

you can listen to example episodes here on Spotify: https://open.spotify.com/show/6liL4T3kJf1scHq134s0mJ

And here on Apple podcasts: https://podcasts.apple.com/us/podcast/kidscast-kids-stories-...


👤 thom
Bonus points for models that work well offline on mobile devices.

👤 mindcrime
Depends on how you define "good". Espeak-ng, for example, works just fine as such. But the quality of the freely available voices is nothing close to the Siri / Google Assistant / Alexa / whatever standard. Understandable? Yes. Usable? Yes. But "good"? Mmmmm... YMMV.

👤 TylerLives
I have no recommendations, but I'm curious if someone has tried to train a TTS on the data made by one of the commercial services. Generating data would be very cheap, labels perfect, and there would be less noise than in the human datasets.

👤 tetmin
We just released & open sourced this as a UI & API: https://tts.themetavoice.xyz/

It's free up to $30 & then cost price after that. It's exceptionally realistic, but can take a bit of time to synthesise as a result.


👤 amelius
Papers-with-code would be the first place to look:

https://paperswithcode.com/task/text-to-speech-synthesis


👤 var_cw
Founder of dubverse.ai here. As someone who has done production level TTS(deployed on India's largest news network) I can say there is alot of room for improvement in terms of intelligibility. Most of these open source toolkits/models offer only a certain quality of TTS which is IMO good to play around with but damn too tough to make it sound studio-quality


👤 mmcwilliams
If your use case allows for a web API, I've had good experience running OpenTTS[0].

It packages several models, including Coqui AI's TTS which I tend to use the most. There's a handy Docker image, too.

[0] https://github.com/synesthesiam/opentts


👤 culi
There's... the web platform. No really, there's a SpeechSynthesis API:

https://developer.mozilla.org/en-US/docs/Web/API/SpeechSynth...


👤 blacklight
I've been quite happy with Mimic3 lately (https://github.com/MycroftAI/mimic3), the engine that powers Mycroft. It also comes with an easy-to-install Docker image.

👤 vram22
Old, but may be of interest:

Speech synthesis in Python with pyttsx

https://jugad2.blogspot.com/2014/03/speech-synthesis-in-pyth...


👤 friend_and_foe
Yeah, I have used espeak, flite tts, RH voice, and a couple of others and they work very well.

👤 dopidopHN
Coqui is a open source text to speech solution.

I haven’t used it in a while but I seen a lot a of new feature listed over the last year or so.

Give it a try

https://github.com/coqui-ai/TTS


👤 okokwhatever
Funny how "everybody" is working into the same ChatGPT projects right now (Speech to Text, API integration, TTS...)

Somehow it's a nice to observe this trend to start working in other areas.


👤 nmfisher
Not sure if you’re looking to train your own model or just run inference on pretrained models, but if it’s the former, you can find espnet, TensorflowTTS and coqui on GitHub.

👤 brianshaler
I've have pretty good luck with flowtron after watching an nvidia screencast on it. CPU only inference performance isn't great though.

👤 IceHegel
When I last compared (about a year ago) Google was the best of the commercial solutions. Is that still the case?

👤 metiscus
CoquiAi seems very good from the work I've done with it.

👤 tslmy
Given a URL, this service return an audio file / stream (in WAV format) that reads out the main content of the webpage.

https://github.com/tslmy/tts


👤 infinite8s
Are there any that use TensorflowLite?

👤 theCrowing
The best is probably tortoise but you have to run it yourself https://github.com/neonbjb/tortoise-tts

here are some demos https://nonint.com/static/tortoise_v2_examples.html