To be specific I mean cutting up the original input file on word boundaries. One audio file per word.
I’m curious to know if given a large enough input set if you could create a sort of dictionary of words each with one or more tiny audio files representing words spoken by a given person.
Bonus points if it could do the same fir sentences.
Does such a thing exist?
The next step being using the database of words and sentences to reproduce someone’s speech using audio of their actual words.
I’m aware that there is AI voice cloning but that’s not what I’m asking about.