Speech to text models, are they usable yet?

Question

I found the following models / tools:* https://huggingface.co/facebook/wav2vec2-large-robust-ft-swbd-300h* https://huggingface.co/datasets/sil-ai/bloom-speech* https://huggingface.co/spaces/Matthijs/speecht5-asr-demo* https://alphacephei.com/vosk/It's a bit hard to search for, since "speech to text" results are more often than not interspersed with results for "text to speech" (for which there are many more results altogether apparently).In any case, there seems to be quite some hardware and/or tinkering required to experiment with the ones I found.So I'm asking if anyone here has already experimented a bit in this space and would share their experiences? Would also be great to learn on what hardware.

Ldorigo · Accepted Answer

Whisper by OpenAI is really really good - and unlike the rest of their offering, it's actually open. It doesn't support streaming dictation though.Depending on whether you want something to run locally and play with or just want to use VTT in your app, there are plenty of startups offering APIs with varying pros and cons (and lots of variability on how well they handle out-of-vocabulary words, which is usually the biggest problem with these tools). If you're interested add some details and I'll edit my answer with some links once I'm on my computer.

ilovefood · Answer

I have been using this with a lot of success for a while now: https://github.com/KoljaB/RealtimeSTT/tree/master , it works in real time, without any delays on an old Nvidia card.
I tried it with German & English without issues. It should also work for French but might need a bit of tweaking. The code is very straightforward, but depending on the context I'd recommend experimenting with the parameters that would suit you.
It's using a model called "Whisper" under the hood.
Have fun :)

mkl · Answer

For searching, [speech recognition] or ["speech to text"] should work. I've experimented a little with whisper.cpp and was quite impressed with how it coped with technical language, though you can't get away without an editing pass.

ezedv · Answer

I know a few more:TranscribeMe: a bot that transcribes WhatsApp and Telegram voice notes https://www.transcribeme.app/ TranscribeGo: Audio transcription and analysis https://www.transcribego.com/

qup · Answer

whisper is very good, I wouldn't use anything else personally. it's free and open, you can run it locally. it makes very few errors (for my own voice, anyway)https://github.com/openai/whisper

adr1an · Answer

https://github.com/mozilla/DeepSpeech maybe

pbronez · Answer

Try searching for &ldquo;automatic speech recognition&rdquo; or &ldquo;ASR&rdquo;

dharmab · Answer

whisper.cpp works pretty well on desktop computers, even on CPUs a few generations old.

Speech to text models, are they usable yet?

For searching, [speech recognition] or ["speech to text"] should work. I've experimented a little with whisper.cpp and was quite impressed with how it coped with technical language, though you can't get away without an editing pass.

I know a few more:
TranscribeMe: a bot that transcribes WhatsApp and Telegram voice notes https://www.transcribeme.app/ TranscribeGo: Audio transcription and analysis https://www.transcribego.com/

whisper is very good, I wouldn't use anything else personally. it's free and open, you can run it locally. it makes very few errors (for my own voice, anyway)
https://github.com/openai/whisper

https://github.com/mozilla/DeepSpeech maybe

Try searching for “automatic speech recognition” or “ASR”

whisper.cpp works pretty well on desktop computers, even on CPUs a few generations old.