* https://huggingface.co/facebook/wav2vec2-large-robust-ft-swbd-300h
* https://huggingface.co/datasets/sil-ai/bloom-speech
* https://huggingface.co/spaces/Matthijs/speecht5-asr-demo
* https://alphacephei.com/vosk/
It's a bit hard to search for, since "speech to text" results are more often than not interspersed with results for "text to speech" (for which there are many more results altogether apparently).
In any case, there seems to be quite some hardware and/or tinkering required to experiment with the ones I found.
So I'm asking if anyone here has already experimented a bit in this space and would share their experiences? Would also be great to learn on what hardware.
Depending on whether you want something to run locally and play with or just want to use VTT in your app, there are plenty of startups offering APIs with varying pros and cons (and lots of variability on how well they handle out-of-vocabulary words, which is usually the biggest problem with these tools). If you're interested add some details and I'll edit my answer with some links once I'm on my computer.
I tried it with German & English without issues. It should also work for French but might need a bit of tweaking. The code is very straightforward, but depending on the context I'd recommend experimenting with the parameters that would suit you.
It's using a model called "Whisper" under the hood.
Have fun :)
TranscribeMe: a bot that transcribes WhatsApp and Telegram voice notes https://www.transcribeme.app/ TranscribeGo: Audio transcription and analysis https://www.transcribego.com/