Does anyone know how Chrome and Chromium achieve this audio isolation?
Given that Chromium is open source, it would be helpful if someone could point me to the specific part of the codebase that handles this. Any insights or technical details would be greatly appreciated!
Within a single process, or tree of processes that can cooperate, this is straightforward (modulo the actual audio signal processing which isn't) to do: keep what you're playing for a few hundreds milliseconds around, compare to what you're getting in the microphone, find correlations, cancel.
If the process aren't related there are multiple ways to do this. Either the OS provides a capture API that does the cancellation, this is what happens e.g. on macOS for Firefox and Safari, you can use this. The OS knows what is being output. This is often available on mobile as well.
Sometimes (Linux desktop, Windows) the OS provides a loopback stream: a way to capture the audio that is being played back, and that can similarly be used for cancellation.
If none of this is available, you mix the audio output and perform cancellation yourself, and the behaviour your observe happens.
Source: I do that, but at Mozilla and we unsurprisingly have the same problems and solutions.
Can't tell you anything else due to NDAs.
https://news.ycombinator.com/item?id=39669626
> I've been working on an audio application for a little bit, and was shocked to find Chrome handles simultaneous recording & playback very poorly. Made this site to demo the issue as clearly as possible
It's a fairly common problem in signal processing, and comes up in "simple" devices like telephones too.
[1] https://www.mathworks.com/help/audio/ug/acoustic-echo-cancel...
This is needed because many people don't use headphones and if you have more than one endpoint with mic and speakers open you will get feedback gallore if you don't do something to suppress it.
I'd say it depends on the combination of the hardware/software/OS that does pieces of it on how audio routing comes together.
Generally you have to see what's available, how it can or can't be routed, what software or settings could be enabled or added to introduce more flexibility in routing, and then making the audio routing work how you want.
More specifically some datapoints:
SOUND DRIVERS: Part of this can be managed by the sound drivers on the computer. Applications like web browsers can access those settings or list of devices available.
Software drivers can let you pick what's that's playing on a computer, and then specifically in browsers it can vary.
CHANNELS: There are often different channels for everything. Physical headphone/microphone jacks, etc. They all become devices with channels (input and output).
ROUTING: The input into a microphone can be just the voice, and/or system audio. System audio can further be broken down to be specific ones. OBS has some nice examples of this functionality.
ADVANCED ROUTING: There are some audio drivers that are virtual audio drivers that can also help you achieve the audio isolation or workflow folks are after.
E.g. PulseAudio and Pipewire have a module for echo cancellation.
There‘s a similar question on SO: https://stackoverflow.com/questions/21795944/remove-known-au...
What's really interesting is I can get the algorithm to "mess up" by using external speakers a foot or two away from my computer's mic! Just that little bit of travel time is enough to screw with the algo.
Surprised to hear that it doesn't seem to work for you when the audio is generated by a different browser, this shouldn't make a difference.