I did think, though, that a pre-requisite for such a model would be a system which could separate a track into its component instruments (and reverse engineer all the audio mixing that went into the final product) in order to reduce the dimensionality of the input to the learning model.
There's been some progress on that front, but not enough to produce a perfect transcription, and I'm not even sure if a transcription to sheet music would be the ideal data representation for an AI to truly understand what makes a good piece of music anyway.