What do you use for speaker diarization?

Question

Hi,I am looking for a fire and forget solution akin to whisper where I can give it a wav of around 12 people and it can give me a diarization on the format (speaker_1, speaker_2, etc)whispercpp gives labels like speaker_turn which is not what I am looking for, I need to know who said whatnvidia nemo only works with 4 speakers and unfortunately is not good enough for meDo you have an open source solution that you can suggest? Or a potential pipeline?Much appreciated!

AlexeyBrin · Accepted Answer

WhisperX with pyannote, but it is not perfect, sometime for the same speaker you will get multiple labels.There is no open source fire and forget solution as far as I know.