HACKER Q&A
📣 Rodeoclash

Is it possible to use AI to program a synthesiser to mimic a sound?


I have a limited understanding of AI so the question I'm about to ask may be entirely nonsensical.

My layman's understanding of GAN machine learning is that you have two networks working against each other. For example, one producing images of peoples faces and the other trying to determine if a given image is of a persons face or not. When both networks are pitted against each other then you can eventually generate believable images of faces.

Modern software synthesisers have near unlimited complexity in producing sounds. For example, after a quick Google, https://www.automatonism.com/the-software shows a modular synthesiser that can be composed together in near unlimited ways. It also comes with an interface for programming it that would suit plugging it into a system driving it programmatically.

Is it possible to combine both a GAN and a modular synthesiser together? In particular, I'd like to start with a wav file of a sound and have the synthesiser come up with a patch which mimics the provided sound.

I'm no stranger to programming patches and can achieve something similar to this manually. Likewise I'm sure that the GAN produced patch would/could be lacking in articulation and modulation of the sound. However, I think it could lead to some interesting "starting places" for sounds which could be further tweaked.

I'd also be interested in how something like this would perform with synthesisers that were less complicated (i.e. be easier to manually tweak once the patch had been produced)


  👤 yummypaint Accepted Answer ✓
How about sampleVAE?

https://medium.com/qosmo-lab/samplevae-a-multi-purpose-ai-to...

I came across this when looking for a different program for classifying drum samples graphically with each as a colored point where proximity is related to similarity of the samples. I think it's an ableton plugin.


👤 bacr
There has been a lot of work in this space, and it’s a lot of fun to read around and play with. The Magenta team at google developed a differentiable digital signal processing system that is what you are describing. Here it is doing some tone transfer: https://sites.research.google/tonetransfer/about

👤 poetically
It sounds like you want variations of a patch instead of what GANs do with the synthesizer and the discriminator. You might be better served with some kind of evolutionary algorithm instead since there is no discrimination involved in what you described.

👤 sharemywin
You need thousands to millions of sounds samples for it to learn from.

👤 lpasselin
I have never worked with audio ML but I am sure what you describe can be done.

Latency might be problematic?

Look into projects using torchaudio