HACKER Q&A
📣 ryanbrwr

What would your audio conversational AI stack be?


I am wondering what the best stack is for creating conversational AI agents. What would you use for transcription? text generation? audio generation?


  👤 brucethemoose2 Accepted Answer ✓
Whisper and a llama v2 roleplaying/chat finetune, no question. I prefer kobold.cpp, but I guess text-generation-ui with your llama backend of choice is good because of its whisper integration.

Not sure about speech generation. There are models that will get inflection and such right, but I am not up to date on this.