Are there any good open source text-to-speech tools?
There are paid services that offer this (e.g. resemble.ai), and a few colab notebooks that I haven't found very helpful, but I wanted to know whether anyone here has had any luck with free text to speech (tts, t2s) tools. Thank you!
I've been playing with mimic 3 from Mycroft lately. It's pretty usable out of the box and is self hostable. https://mycroft.ai/mimic-3/
I actually looked into this a month or so ago, what I was looking for was just reasonable sounding simple tts driven by the cli (so heavyweight things were out, as were most things with a local server, though I looked at some I think).
I ended up going with pico-tts[0]. I remember looking at a few other things and left myself the following comment:
# checked out mimic as well. Didn't seem great, espeak is like nails on a chalkboard
# haven't checked out marytts or larynx or anything, but this is good enough™
[0]
https://github.com/Iiridayn/pico-tts
I have a ton of fun using the "say" program on MacOS to write toy programs with my kids, have often wanted a version that could run on my eldest's Manjaro laptop. Are any of the above analogously simple to use?
I’m interested in the opposite: I want to transcribe meetings at work because my memory and note taking are inadequate.
I’m familiar with things like otter.ai but I am not risking my job by sharing data with something I don’t control.
it's the only one I have any experience with
This doesn't answer the question, but I thought it might be relevant to mention here that I've been using chatGPT + resemble.ai to create what I believe is the first kid's stories podcast created entirely by AI.
Here's how it works:
- Kid requests a story about a, b, c on www.makedupstories.com
- chatGPT generates the text for a story, summary, and title
- we send this to resemble.ai (sounds like Tortoise TTS would work just as well), which has a clone of my voice
- the audio file then gets sent to anchor.fm
you can listen to example episodes here on Spotify: https://open.spotify.com/show/6liL4T3kJf1scHq134s0mJ
And here on Apple podcasts:
https://podcasts.apple.com/us/podcast/kidscast-kids-stories-...
Bonus points for models that work well offline on mobile devices.
Depends on how you define "good". Espeak-ng, for example, works just fine as such. But the quality of the freely available voices is nothing close to the Siri / Google Assistant / Alexa / whatever standard. Understandable? Yes. Usable? Yes. But "good"? Mmmmm... YMMV.
I have no recommendations, but I'm curious if someone has tried to train a TTS on the data made by one of the commercial services. Generating data would be very cheap, labels perfect, and there would be less noise than in the human datasets.
It's free up to $30 & then cost price after that. It's exceptionally realistic, but can take a bit of time to synthesise as a result.
Founder of dubverse.ai here. As someone who has done production level TTS(deployed on India's largest news network) I can say there is alot of room for improvement in terms of intelligibility. Most of these open source toolkits/models offer only a certain quality of TTS which is IMO good to play around with but damn too tough to make it sound studio-quality
If your use case allows for a web API, I've had good experience running OpenTTS[0].
It packages several models, including Coqui AI's TTS which I tend to use the most. There's a handy Docker image, too.
[0] https://github.com/synesthesiam/opentts
Yeah, I have used espeak, flite tts, RH voice, and a couple of others and they work very well.
Coqui is a open source text to speech solution.
I haven’t used it in a while but I seen a lot a of new feature listed over the last year or so.
Give it a try
https://github.com/coqui-ai/TTS
Funny how "everybody" is working into the same ChatGPT projects right now (Speech to Text, API integration, TTS...)
Somehow it's a nice to observe this trend to start working in other areas.
Not sure if you’re looking to train your own model or just run inference on pretrained models, but if it’s the former, you can find espnet, TensorflowTTS and coqui on GitHub.
I've have pretty good luck with flowtron after watching an nvidia screencast on it. CPU only inference performance isn't great though.
When I last compared (about a year ago) Google was the best of the commercial solutions. Is that still the case?
CoquiAi seems very good from the work I've done with it.
Given a URL, this service return an audio file / stream (in WAV format) that reads out the main content of the webpage.
https://github.com/tslmy/tts
Are there any that use TensorflowLite?