HACKER Q&A
📣 rexbee

What do you think is coming next for generative AI?


It seems GPT3 suggests the best token/word given the previous words.

Will it be possible to, given a large enough dataset of MP3 files, predict the next millisecond of audio based on previous milliseconds of audio and generate songs? Will we generate videos by predicting the next best frame?

Is there any technical reason we couldn't collect first person audio and video data with the cameras and microphone on a Quest Pro and generate how the next few minutes of our life could look?


  👤 CrypticShift Accepted Answer ✓
> predict the next millisecond of audio based on previous milliseconds of audio

Not milliseconds, but AudioLM [1] already does it with just seconds, for speech (and piano). Results are already very convincing (to me).

[1] https://google-research.github.io/seanet/audiolm/examples/


👤 terminal_d
"Signatures" on videos to prove that, yes, they are "authentic" and not AI-generated. I have no idea how it'd be enforced though.

👤 drKarl
Ah, you mean like in Devs?