HACKER Q&A
📣 spacetimeuser5

Which speech to text model would you recommend?


I may need to perform a bit of speech-to-text (English at least, but in perspective - multilingual also) from video or audio files. Which speech-to-text model/API would you recommend, which sort of performs the best and can also do noise etc reduction?


  👤 smoldesu Accepted Answer ✓
Whisper, 100%. It's small, fast and does a really good job with most of the recordings I can feed it. IIRC, there are both English and mixed-language models to choose from as well.