What tech stack is OpenAI using to generate its human level voices?

Question

If you've had a chance to play with ChatGPT's recently released human speech synthesis, hopefully you've noticed how incredibly realistic it is. I think its the addition of umms and ahhs and other regular speech patterns that really help to sell it.Check out the samples from their announcement blog article: https://openai.com/blog/chatgpt-can-now-see-hear-and-speakDoes anyone have an idea of what tech stack they're using to generate it? They indicate they hired professional voice actors to do the training. What about the software stack?

behnamoh · Accepted Answer

My gripe with OpenAI right now is that they don't offer the voices on their app through their API. For example, my favorite (Sky) is not available.Another point about OpenAI's API voices: I don't know if it's intentional, but a couple of them sound more like gender-less if that makes sense. Like you can't tell if it's a male or female voice (somewhere in between) which makes them unrealistic.

loud_cloud · Answer

OpenAI voices are excellent. The ones on recent "HeyPi.com" app for iOS update, crush it IM0.