Is there a whisper-like speech-to-text that detects the speaker?
I found some commercial (expensive) offerings doing this but there doesn't seem to be an open source way to categorise the output of whisper into different speakers/sources?
Thinking of this for podcast analysis purposes.