- Deepgram also looks interesting, recently they released a new model (Nova), they also offer Whisper for a cheaper price ($0.0048/min), I've briefly played a little bit with it but the DX looked a bit bad. They're also offering $200 in credits now.
- If you're on a really tight budget. Most browsers [1] support the SpeechRecognition API [2] where you can transcribe for free. Depends on the browser it works better, for example in Google Chrome it works excellent as the browser actually sends the audio to the cloud (probably uses GCP's Google Cloud Speech to Text)
[1] https://developer.mozilla.org/en-US/docs/Web/API/SpeechRecog... [2] https://developer.mozilla.org/en-US/docs/Web/API/SpeechRecog...
Depending on the platform you're targeting.
https://github.com/dictation-toolbox/dragonfly Might be interesting to you.
Whisper is cheapest
AssemblyAI and Google Cloud Speech to Text are more accurate
Overall, I wouldn't recommend Whisper unless the transcription accuracy doesn't need to be high. I'm hoping they release the "GPT-4" equivalent of Whisper.
I'm interacting with GPT, so it usually doesn't care about the mistakes, it normally interprets them as what they are supposed to be.
however, depending on what you're building, you may consider local speech-to-text by running speech-to-text on user's devices, basically you do not pay for the cloud.
you should understand whether you'll need model adaptation -like adding custom industry jargon or so. whisper might be challenging.