I understand that composing involves a lot of nuances and it's not just to put music notes down and call it a day. But it's still a little hard to believe that it's much more difficult to make an AI that, say, can compose music tracks like https://www.youtube.com/watch?v=OSPkn-iHPWA (Pokemon Sword's Title Screen), than to make ChatGPT.
Is Riffusion(https://www.riffusion.com/) the cutting edge tech? Or am I missing something?
Played with riffusion some, not at all impressed by it. It is mimicking without understanding and while it occasionally did something interesting it has no real comprehension of time scales over short loops, no understanding of larger structures and fails completely on the prompt if it is something it can not easily research. This is about where AI generated music has been for a decade now and can not seem to push past. Part of this is probably because AI was integrated into composition a good long while ago, composers tend to treat it more like an instrument or a filter than something which writes music and most of the work with AI in music is towards those ends, not towards getting AI good at composition.
I think that's one of the big issues with relative lack of emphasis AI generated music. Not only is the hard stuff relatively hard, the easy stuff that's generated to conform to regular Western melody/harmony norms played on a regular digital instrument is so easy it doesn't need NNs.
Musicians' copyright lawyers have also been extremely successful at demanding writing credits for 'inspiration' compared with other creative industries.
IMO, for music, the limiting factor today is more copyright (=datasets) than models per se.
[1] https://google-research.github.io/seanet/musiclm/examples/