Check out the samples from their announcement blog article: https://openai.com/blog/chatgpt-can-now-see-hear-and-speak
Does anyone have an idea of what tech stack they're using to generate it? They indicate they hired professional voice actors to do the training. What about the software stack?
Another point about OpenAI's API voices: I don't know if it's intentional, but a couple of them sound more like gender-less if that makes sense. Like you can't tell if it's a male or female voice (somewhere in between) which makes them unrealistic.