How is Stable Diffusion different from sampling in hip-hop?

Question

Stable Diffusion looks suspiciously like hip-hop sampling in the 80s.How do these products really differ? Will Stable Diffusion face the same scrutiny? Could licensing be a solution (think official training sets)?

jonas_kgomo · Accepted Answer

Sampling in hip hop is about taking 1 or 2 audio sources mapping them into some new audio, SD was trained on 2.3 billion data points. In a diffusion model you put noise on an input image, until its recognizeable. For a song this would mean, making it unrecognizable by making it literally sound like noise, and then try to denoise it, this process wouldn't be very helpful i assume, but might be a good experiment. Open AI's Jukebox uses an autoenconder method with autoregressive transformers, given this architecture, i would assume you can theoretically do stable diffusion architecture on audio, to generate new audio. https://openai.com/blog/jukebox/