How do browsers isolate internal audio from microphone input?

Question

I've noticed an interesting feature in Chrome and Chromium: they seem to isolate internal audio from the microphone input. For instance, when I'm on a Google Meet call in one tab and playing a YouTube video at full volume in another tab, the video&rsquo;s audio isn&rsquo;t picked up by Google Meet. This isolation doesn&rsquo;t happen if I use different browsers for each task (e.g., Google Meet on Chrome and YouTube on Chromium).Does anyone know how Chrome and Chromium achieve this audio isolation?Given that Chromium is open source, it would be helpful if someone could point me to the specific part of the codebase that handles this. Any insights or technical details would be greatly appreciated!

padenot · Accepted Answer

The way this works (and I'm obviously taking a high level view here) is by comparing what is being played to what is being captured. There is an inherent latency in between what is called the capture stream (the mic) and the reverse stream (what is being output to the speakers, be it people taking or music or whatever), and by finding this latency and comparing, one can cancel the music from the speech captured.
Within a single process, or tree of processes that can cooperate, this is straightforward (modulo the actual audio signal processing which isn't) to do: keep what you're playing for a few hundreds milliseconds around, compare to what you're getting in the microphone, find correlations, cancel.
If the process aren't related there are multiple ways to do this. Either the OS provides a capture API that does the cancellation, this is what happens e.g. on macOS for Firefox and Safari, you can use this. The OS knows what is being output. This is often available on mobile as well.
Sometimes (Linux desktop, Windows) the OS provides a loopback stream: a way to capture the audio that is being played back, and that can similarly be used for cancellation.
If none of this is available, you mix the audio output and perform cancellation yourself, and the behaviour your observe happens.
Source: I do that, but at Mozilla and we unsurprisingly have the same problems and solutions.

umutisik · Answer

It's called Acoustic Echo Cancellation. An implementation is included in WebRTC included in Chrome. A FIR filter (1D convolution) is applied to what the browser knows is coming out of the speakers; and this filter is continually optimized to to cancel out as much as possible of what's coming into the microphone (this is a first approximation, the actual algorithm is more involved).

meindnoch · Answer

Search for the compilation flag "CHROME_WIDE_ECHO_CANCELLATION" in the Chromium sources, and you will find your answer.Can't tell you anything else due to NDAs.

codetrotter · Answer

Side note, this can also cause a bit of difficulty in some situations apparently as seen in a HN post from a few months ago that didn’t get much attention
https://news.ycombinator.com/item?id=39669626
> I've been working on an audio application for a little bit, and was shocked to find Chrome handles simultaneous recording & playback very poorly. Made this site to demo the issue as clearly as possible
https://chrome-please-fix-your-audio.xyz/

supriyo-biswas · Answer

The technical term that you're looking for is acoustic echo cancellation[1].
It's a fairly common problem in signal processing, and comes up in "simple" devices like telephones too.
[1] https://www.mathworks.com/help/audio/ug/acoustic-echo-cancel...

kajecounterhack · Answer

Google Meet uses source separation technology to denoise the audio. It's a neural net that's been trained to separate speech from non-speech and ensure that only speech is being piped through. It can even separate different speakers from one another. This technology got really good around 2021 when semi-supervised ways of training the models were developed, and is still improving :)

atoav · Answer

A side effect of echo cancellation. Browser knows what audio it is playing, can correlate that to whatever comes in through the mic, maybe even by outputing inaudible test signals, or by picking wide supported defaults.This is needed because many people don't use headphones and if you have more than one endpoint with mic and speakers open you will get feedback gallore if you don't do something to suppress it.

j45 · Answer

Have used audio a lot on windows/mac for a long time, and a bit of linux too.
I'd say it depends on the combination of the hardware/software/OS that does pieces of it on how audio routing comes together.
Generally you have to see what's available, how it can or can't be routed, what software or settings could be enabled or added to introduce more flexibility in routing, and then making the audio routing work how you want.
More specifically some datapoints:
SOUND DRIVERS: Part of this can be managed by the sound drivers on the computer. Applications like web browsers can access those settings or list of devices available.
Software drivers can let you pick what's that's playing on a computer, and then specifically in browsers it can vary.
CHANNELS: There are often different channels for everything. Physical headphone/microphone jacks, etc. They all become devices with channels (input and output).
ROUTING: The input into a microphone can be just the voice, and/or system audio. System audio can further be broken down to be specific ones. OBS has some nice examples of this functionality.
ADVANCED ROUTING: There are some audio drivers that are virtual audio drivers that can also help you achieve the audio isolation or workflow folks are after.

alihesari · Answer

So Chrome and Chromium got this cool trick where they block internal audio from your mic. Like, you can be on a Google Meet call and blast a YouTube vid in another tab, and Meet won&rsquo;t pick it up. No clue how they do it exactly, but since Chromium&rsquo;s open-source, someone can probably dig into the code for the deets. If anyone knows the techy stuff behind this, spill the beans!

_flux · Answer

I think this would be part of echo cancellation: in a meeting you don't want the data from the meeting to be fed back to it. I suppose it uses the all the streams from the browser then, though I think in general it would be even better to cancel out everything that comes from the speakers. Maybe it can work this way on some other platforms?E.g. PulseAudio and Pipewire have a module for echo cancellation.

danhau · Answer

Since Chrome has the PCM data it&lsquo;s writing to the speaker, it can use that to remove similar sounds from a mic. That&lsquo;s my guess.There&lsquo;s a similar question on SO: https://stackoverflow.com/questions/21795944/remove-known-au...

exabrial · Answer

"echo cancellation" is what its called, there's a few general purpose (non-ai) algos out there!What's really interesting is I can get the algorithm to "mess up" by using external speakers a foot or two away from my computer's mic! Just that little bit of travel time is enough to screw with the algo.

cbracketdash · Answer

This concept is known as "Echo Cancellation" https://en.wikipedia.org/wiki/Echo_suppression_and_cancellat...

hpen · Answer

Since chrome has the data from both sources: the microphone, and the audio stream from YouTube, I imagine you can construct a filter from the impulse response of the YouTube source and then run the microphone through it

glii · Answer

https://en.wikipedia.org/wiki/Echo_suppression_and_cancellat...

bigbones · Answer

Guessing it's a feature of the WebRTC stack if it's to be found anywhere, there's always a requirement for cancelling feedback from other meeting participants

sciencesama · Answer

Windows 11 does this ! Videos playing on browser (edge, tested on youtube, linkedin videos) wont he heard to the meeting folks or folks on call in teams !

sciencesama · Answer

Windows 11 does that !! Videos playing on edge wont be heard by the folks on teams !!

lowdownbutter · Answer

You forgot to mention if you're using speakers.

amelius · Answer

You can use a multidimensional Kalman filter to do this.

eonpi · Answer

This is something that is usually taken care of by the App that's receiving the input from the microphone (Google Meet, Teams, etc). The App breaks the audio into frequencies, and the ones that correspond to human voice ranges are accepted, and anything else is rejected. This is referred to as, for example, voice isolation, and has been turned on by default in all major meeting Apps for a little while now.Surprised to hear that it doesn't seem to work for you when the audio is generated by a different browser, this shouldn't make a difference.

How do browsers isolate internal audio from microphone input?

Search for the compilation flag "CHROME_WIDE_ECHO_CANCELLATION" in the Chromium sources, and you will find your answer.
Can't tell you anything else due to NDAs.

The technical term that you're looking for is acoustic echo cancellation[1].
It's a fairly common problem in signal processing, and comes up in "simple" devices like telephones too.
[1] https://www.mathworks.com/help/audio/ug/acoustic-echo-cancel...

Since Chrome has the PCM data it‘s writing to the speaker, it can use that to remove similar sounds from a mic. That‘s my guess.
There‘s a similar question on SO: https://stackoverflow.com/questions/21795944/remove-known-au...

This concept is known as "Echo Cancellation" https://en.wikipedia.org/wiki/Echo_suppression_and_cancellat...

Since chrome has the data from both sources: the microphone, and the audio stream from YouTube, I imagine you can construct a filter from the impulse response of the YouTube source and then run the microphone through it

https://en.wikipedia.org/wiki/Echo_suppression_and_cancellat...

Guessing it's a feature of the WebRTC stack if it's to be found anywhere, there's always a requirement for cancelling feedback from other meeting participants

Windows 11 does this ! Videos playing on browser (edge, tested on youtube, linkedin videos) wont he heard to the meeting folks or folks on call in teams !

Windows 11 does that !! Videos playing on edge wont be heard by the folks on teams !!

You forgot to mention if you're using speakers.

You can use a multidimensional Kalman filter to do this.

How do browsers isolate internal audio from microphone input?

Search for the compilation flag "CHROME_WIDE_ECHO_CANCELLATION" in the Chromium sources, and you will find your answer.Can't tell you anything else due to NDAs.

The technical term that you're looking for is acoustic echo cancellation[1].It's a fairly common problem in signal processing, and comes up in "simple" devices like telephones too.[1] https://www.mathworks.com/help/audio/ug/acoustic-echo-cancel...

Since Chrome has the PCM data it‘s writing to the speaker, it can use that to remove similar sounds from a mic. That‘s my guess.There‘s a similar question on SO: https://stackoverflow.com/questions/21795944/remove-known-au...

This concept is known as "Echo Cancellation" https://en.wikipedia.org/wiki/Echo_suppression_and_cancellat...

Since chrome has the data from both sources: the microphone, and the audio stream from YouTube, I imagine you can construct a filter from the impulse response of the YouTube source and then run the microphone through it

https://en.wikipedia.org/wiki/Echo_suppression_and_cancellat...

Guessing it's a feature of the WebRTC stack if it's to be found anywhere, there's always a requirement for cancelling feedback from other meeting participants

Windows 11 does this ! Videos playing on browser (edge, tested on youtube, linkedin videos) wont he heard to the meeting folks or folks on call in teams !

Windows 11 does that !! Videos playing on edge wont be heard by the folks on teams !!

You forgot to mention if you're using speakers.

You can use a multidimensional Kalman filter to do this.

Search for the compilation flag "CHROME_WIDE_ECHO_CANCELLATION" in the Chromium sources, and you will find your answer.
Can't tell you anything else due to NDAs.

The technical term that you're looking for is acoustic echo cancellation[1].
It's a fairly common problem in signal processing, and comes up in "simple" devices like telephones too.
[1] https://www.mathworks.com/help/audio/ug/acoustic-echo-cancel...

Since Chrome has the PCM data it‘s writing to the speaker, it can use that to remove similar sounds from a mic. That‘s my guess.
There‘s a similar question on SO: https://stackoverflow.com/questions/21795944/remove-known-au...