HACKER Q&A
📣 mjbale116

What do you use for speaker diarization?


Hi,

I am looking for a fire and forget solution akin to whisper where I can give it a wav of around 12 people and it can give me a diarization on the format (speaker_1, speaker_2, etc)

whispercpp gives labels like speaker_turn which is not what I am looking for, I need to know who said what

nvidia nemo only works with 4 speakers and unfortunately is not good enough for me

Do you have an open source solution that you can suggest? Or a potential pipeline?

Much appreciated!


  👤 AlexeyBrin Accepted Answer ✓
WhisperX with pyannote, but it is not perfect, sometime for the same speaker you will get multiple labels.

There is no open source fire and forget solution as far as I know.