My layman's understanding of GAN machine learning is that you have two networks working against each other. For example, one producing images of peoples faces and the other trying to determine if a given image is of a persons face or not. When both networks are pitted against each other then you can eventually generate believable images of faces.
Modern software synthesisers have near unlimited complexity in producing sounds. For example, after a quick Google, https://www.automatonism.com/the-software shows a modular synthesiser that can be composed together in near unlimited ways. It also comes with an interface for programming it that would suit plugging it into a system driving it programmatically.
Is it possible to combine both a GAN and a modular synthesiser together? In particular, I'd like to start with a wav file of a sound and have the synthesiser come up with a patch which mimics the provided sound.
I'm no stranger to programming patches and can achieve something similar to this manually. Likewise I'm sure that the GAN produced patch would/could be lacking in articulation and modulation of the sound. However, I think it could lead to some interesting "starting places" for sounds which could be further tweaked.
I'd also be interested in how something like this would perform with synthesisers that were less complicated (i.e. be easier to manually tweak once the patch had been produced)
https://medium.com/qosmo-lab/samplevae-a-multi-purpose-ai-to...
I came across this when looking for a different program for classifying drum samples graphically with each as a colored point where proximity is related to similarity of the samples. I think it's an ableton plugin.
Latency might be problematic?
Look into projects using torchaudio