HACKER Q&A
📣 apples_oranges

Best speech synthesis library/service?


I am using text to speech a lot and in my experience naturalreaders.com and the chat gpt interface are the best. But there must be something free out there that's just as good? I basically want to run shell scripts that convert website content to good spoken audio. The built-in Mac "Start Speaking" feature doesn't cut it, for example. Too robotic..


  👤 skygazer Accepted Answer ✓
I like piper. On my Mac, I have a one liner triggered by a mouse/trackpad gesture that copies any selected text, sanitizes it, pipes it to piper, which in turn pipes a raw audio stream into IINA, which I can pause or rewind. It also sets the title of the IINA window to the page title, window title or app name of the text source, so that I can have multiple readers paused and differentiate them.

Some of the piper voices are a step up from Apple’s. But I also have it automatically cycle through my preferred voices, which reduces monotony. piper is feee and on GitHub and doesn’t use much cpu. I had to comb through many voice to find the most realistic and natural sounding. (Some of the truly human sounding voices would hallucinate gibberish occasionally, so I abandoned those.)

I set this up because Apple’s reader would choke on text with special characters and abort reading, or read aloud bullet point markers strangely, or suddenly start reading all subsequent English text as if it was French after encountering a French name.

I love it so much, and in no small part to IINA respecting media keys and headset buttons, so I can be listening to an article and remove an AirPod and the article stops. This makes it so much easier to use around interruptions.


👤 A_D_E_P_T
Though it's not free -- there's a free tier, but it's highly restricted and not very functional -- I use elevenlabs.io for text to speech.

It's extremely good at mimicking voices, and it sounds quite natural even with those custom-generated voices.


👤 sfmz
I'm not familar with Mac speech, so idk if this is better or worse:

https://huggingface.co/spaces/Xenova/text-to-speech-client


👤 solardev
Unfortunately all the best recent models are paid cloud endpoints. But ElevenLabs, OpenAI, Google Cloud, AWS, etc. all have pretty decent TTS endpoints if you're willing to pay per use.