I'm experimenting with adding voice to an LLM project I'm working on. I need accurate ASR and TTS, both as close to realtime as possible, ideally able to stream partial results.
I've tried searching online for the best options, but just seeing SEO spam or outdated results.
I'm open to using a cloud service or self-hosting an open source model, just want the best possible speed/accuracy.
Does anyone have suggestions or recommendations?
Thanks!
Does anyone else know other good things like what Op is looking for?