Am I massively over simplifying or is it possible that they just took all the text input, ran it through whisper at training time, and did an otherwise “normal” training run?
It can’t be this simple, right? I’m assuming I just don’t know enough to know how wrong I am.
[EDIT] Before someone asks, I did ask ChatGPT this question and what it spit out is roughly what a I would characterize as the title, but obviously I can’t falsify it —- ChatGPT currently knows more about AI than I do