Whisper Transcription API?

Question

Do you know of a transcription API that uses Whisper on the backend?For those who don't own a GPU, or want to implement transcription without having to deal with parallelizing CPUs and such.Ideally, it should support both transcribing files and streaming. And it should allow me to select which Whisper model to use.I know about whisper.cpp [1]. Am looking for an off-site API so I don't have to deal with the backend myself.Thanks!

leobg · Accepted Answer

Answering my own question, for the sake of posterity:
Just found banana.dev, which allows you to essentially 1-click deploy Whisper.
Then you can use their Python library to connect and run inference.
They use the GPU variant by default (not whisper.cpp), and you are being billed for GPU hours. By default, the model "hibernates" after 10 seconds of inactivity. Which, for me, results in inference times of around 15 seconds (most of which, I presume, is for "waking up" the model).