About the Acoustic Echo Cancellation Effect#

Note

In this guide, the term Acoustic Echo Cancellation is used interchangeably with AEC (referred to as aec in the API).

An acoustic echo occurs when a microphone, also known as a near-end microphone, picks up audio signals from the speaker and sends it back to the original recipient. The original recipient hears his or her own delayed voice mixed with a target signal, which makes the communication unintelligible.

The Acoustic Echo Cancellation Effect cancels or suppresses this delayed voice, also known as an acoustic feedback or echo, from the audio. This process improves the overall quality of the recording.

To run the sample application on Windows for this effect, use the following command:

:: Format: run_effect_demo.bat <architecture> <effect> <input_sample_rate> <output_sample_rate>

:: 16k effect
run_effect_demo.bat turing aec 16k 16k

:: 48k effect
run_effect_demo.bat ampere aec 48k 48k

Note

For more information, see Use the Helper Script to Run the Sample Application.

To run the sample application on Linux for this effect, use the following command:

# (One time, initial setup): Download models using models/download_models.sh
./download_models.sh --gpu <gpu> --effects aec-16k,aec-48k

# Refer to Section 3.2 for further details
Format: ./run_effect.sh -g <gpu> -s <sample_rate> -e aec

# 16k effect
./run_effect.sh -g t4 -s 16 -e aec

# 48k effect
./run_effect.sh -g t4 -s 48 -e aec

Note

For more information, see Use the Helper Script to Run the Sample Application.

This effect has the following characteristics:

Supported input/output audio is 32-bit float audio with a sampling rate of 16 kHz or 48 kHz.

In the Linux SDK, this effect has the following maximum throughput (the number of batches supported in real time):

Architecture	Maximum Throughput for the 16K Effect	Maximum Throughput for the 48K Effect
T4	1940	800
A100	9240	4050
A10	4200	1650
L40	7770	3000
H100	11000	4760
B100	15400	7150
RTX PRO 6000	15400	6750

Figure 1-1. Basic AEC Scenario

The AEC Effect takes the following inputs:

The near-end microphone signal (denoted by y).
The far-end microphone signal (denoted by x).

The far-end speaker signal x is the microphone signal of the original recipient. The near-end microphone signal (y) can be described as a combination of the near-end speech signal s and echo signal of the far-end speaker e. The output of the effect is the near-end speech signal s', which is the input combination s + e with the far-end echo signal e removed: s' = (mixture of s + e) - e.

If only the far-end echo signal e is present and the near-end signal s is silent, the output from this effect is silent.

When the AEC effect is integrated in a conferencing application server, multiple streams of data need to run in a batch, one for each speaker. Consider the scenario in figure 1-2, where s(1) corresponds to AEC batch 1 and s(2) corresponds to batch 2.

Figure 1-2. Batched Audio Processing Using the AEC Effect (Linux SDK Only)

The following steps describe how the AEC effect processes audio as seen in figure 1-2:

The application server receives a microphone recording from Speaker A y(1).
The Application server passes y(1) to the AEC batch (1). Silence is passed down as a far-end speech signal x(1) to the effect because the server does not yet have the far-end speech.
The effect produces processed audio s'(1), which is passed down to Speaker B.
Speaker B sends the near-end audio y(2) to the application server. This data consists of speech (s(1)) and audio played on speakers (e(2)).
The Application server processes batch (2) with y(1) as near-end audio, and the s'(1) that was received from Step 3 as far-end audio. This is the same audio that was played on the server.
The output from batch s'(2) is passed to Speaker A.

For the settings required for the AEC effect, see Set the Parameters of an Audio Effect.