Would you recommend OpenAI Whisper for Speech to text?

Question

I'm building a product that requires speech-to-text. I'm thinking of going with Whisper as it seems cheap $0.006/min and heard the transcribed text quality is good. Are there any better alternatives?

drag0s · Accepted Answer

- AssemblyAI was the winner for the tests we did some months ago, very reliable and accurate.
- Deepgram also looks interesting, recently they released a new model (Nova), they also offer Whisper for a cheaper price ($0.0048/min), I've briefly played a little bit with it but the DX looked a bit bad. They're also offering $200 in credits now.
- If you're on a really tight budget. Most browsers [1] support the SpeechRecognition API [2] where you can transcribe for free. Depends on the browser it works better, for example in Google Chrome it works excellent as the browser actually sends the audio to the cloud (probably uses GCP's Google Cloud Speech to Text)
[1] https://developer.mozilla.org/en-US/docs/Web/API/SpeechRecog... [2] https://developer.mozilla.org/en-US/docs/Web/API/SpeechRecog...

FloatArtifact · Answer

I've experimented with whisper. I don't know of a way to do commands without parsing dictation. Bottom line, the model has to pass 30 seconds of audio to my knowledge. So say if you're utterance is 5 seconds, you'll need 25 seconds of silence.
Depending on the platform you're targeting.
https://github.com/dictation-toolbox/dragonfly Might be interesting to you.

tikkun · Answer

I've tried a few:Whisper is cheapestAssemblyAI and Google Cloud Speech to Text are more accurateOverall, I wouldn't recommend Whisper unless the transcription accuracy doesn't need to be high. I'm hoping they release the "GPT-4" equivalent of Whisper.

satvikpendem · Answer

You can self host it too if you want, that's the good part about Whisper, since it's open source.

qup · Answer

I've been using whisper since it was there and it's also open source and I know I can host my own. I use it with I would say 95% accuracy, possibly more.I'm interacting with GPT, so it usually doesn't care about the mistakes, it normally interprets them as what they are supposed to be.

java_beyb · Answer

if your decision is cost-oriented, then Whisper API is the cheapest - at least based on what other API companies promote on their websites.
however, depending on what you're building, you may consider local speech-to-text by running speech-to-text on user's devices, basically you do not pay for the cloud.
you should understand whether you'll need model adaptation -like adding custom industry jargon or so. whisper might be challenging.

ezedv · Answer

You can use TranscribeMe, it's for Telegram and WhatsApp; it's totally free! https://transcribeme.app

muttantt · Answer

use deepgram, they recently added Whisper as a model too

adyashakti · Answer

free ios app: https://apps.apple.com/us/app/aiko/id1672085276