What do you think is coming next for generative AI?

Question

It seems GPT3 suggests the best token/word given the previous words.Will it be possible to, given a large enough dataset of MP3 files, predict the next millisecond of audio based on previous milliseconds of audio and generate songs? Will we generate videos by predicting the next best frame?Is there any technical reason we couldn't collect first person audio and video data with the cameras and microphone on a Quest Pro and generate how the next few minutes of our life could look?

CrypticShift · Accepted Answer

> predict the next millisecond of audio based on previous milliseconds of audio
Not milliseconds, but AudioLM [1] already does it with just seconds, for speech (and piano). Results are already very convincing (to me).
[1] https://google-research.github.io/seanet/audiolm/examples/

terminal_d · Answer

"Signatures" on videos to prove that, yes, they are "authentic" and not AI-generated. I have no idea how it'd be enforced though.

drKarl · Answer

Ah, you mean like in Devs?

What do you think is coming next for generative AI?

> predict the next millisecond of audio based on previous milliseconds of audioNot milliseconds, but AudioLM [1] already does it with just seconds, for speech (and piano). Results are already very convincing (to me).[1] https://google-research.github.io/seanet/audiolm/examples/

"Signatures" on videos to prove that, yes, they are "authentic" and not AI-generated. I have no idea how it'd be enforced though.

Ah, you mean like in Devs?

> predict the next millisecond of audio based on previous milliseconds of audio
Not milliseconds, but AudioLM [1] already does it with just seconds, for speech (and piano). Results are already very convincing (to me).
[1] https://google-research.github.io/seanet/audiolm/examples/