Which speech to text model would you recommend?

Question

I may need to perform a bit of speech-to-text (English at least, but in perspective - multilingual also) from video or audio files. Which speech-to-text model/API would you recommend, which sort of performs the best and can also do noise etc reduction?

smoldesu · Accepted Answer

Whisper, 100%. It's small, fast and does a really good job with most of the recordings I can feed it. IIRC, there are both English and mixed-language models to choose from as well.