effects_demo Application#

This application demonstrates how to use the AFX SDK to apply effects to audio.

Build the Application#

  1. Navigate to the samples/effects_demo directory.

  2. Run the application using one of the following scripts, which automatically build the sample application.

    • To automatically detect the current GPU and load the corresponding models:

      ./run_effect.sh -s 16 -b 1 -e denoiser
      ./run_effect.sh -s 48 -b 1 -e dereverb
      ./run_effect.sh -s 16 -b 400 -e denoiser
      ./run_effect.sh -s 48 -b 400 -e dereverb_denoiser
      ./run_effect.sh -s 48 -b 400 -e aec
      ./run_effect.sh -s 8 -o 16 -b 400 -e superres
      ./run_effect.sh -s 16 -b 1 -e studio_voice_high_quality
      ./run_effect.sh -s 48 -b 1 -e studio_voice_high_quality
      ./run_effect.sh -s 48 -b 1 -e studio_voice_low_latency
      ./run_effect.sh -s 16 -b 1 -e speaker_focus
      ./run_effect.sh -s 48 -b 1 -e speaker_focus
      ./run_effect.sh -s 16 -b 3 -e voice_font_high_quality
      ./run_effect.sh -s 48 -b 3 -e voice_font_high_quality
      ./run_effect.sh -s 16 -b 3 -e voice_font_low_latency
      ./run_effect.sh -s 48 -b 3 -e voice_font_low_latency
      
    • To explicitly specify the GPU:

      ./run_effect.sh -g t4 -s 16 -b 1 -e denoiser
      ./run_effect.sh -g t4 -s 48 -b 1 -e dereverb
      ./run_effect.sh -g t4 -s 16 -b 400 -e denoiser
      ./run_effect.sh -g t4 -s 48 -b 400 -e dereverb_denoiser
      ./run_effect.sh -g t4 -s 48 -b 400 -e aec
      ./run_effect.sh -g t4 -s 8 -o 16 -b 96 -e superres
      ./run_effect.sh -g t4 -s 16 -b 1 -e studio_voice_high_quality
      ./run_effect.sh -g t4 -s 48 -b 1 -e studio_voice_high_quality
      ./run_effect.sh -g t4 -s 48 -b 1 -e studio_voice_low_latency
      ./run_effect.sh -g t4 -s 16 -b 1 -e speaker_focus
      ./run_effect.sh -g t4 -s 48 -b 1 -e speaker_focus
      ./run_effect.sh -g t4 -s 16 -b 3 -e voice_font_high_quality
      ./run_effect.sh -g t4 -s 48 -b 3 -e voice_font_high_quality
      ./run_effect.sh -g t4 -s 16 -b 3 -e voice_font_low_latency
      ./run_effect.sh -g t4 -s 48 -b 3 -e voice_font_low_latency
      

Note

Ensure that the application uses the TensorRT (requires the exact version) or CUDA libraries (requires the exact version or later) that is bundled with the SDK (under external/cuda/lib).

If the distro exports LD_LIBRARY_PATH from ~/.bashrc or similar, the TensorRT and CUDA paths might be overridden. As a result, the SDK might load incompatible CUDA or TensorRT library versions. To avoid this issue, before you run the sample program, append the external directory to LD_LIBRARY_PATH by executing the following command:

$ export LD_LIBRARY_PATH=external/cuda/lib:$LD_LIBRARY_PATH

Note

The sample app might hit the limit for the maximum number of open files that is imposed by default by the Linux kernel, especially for large batch sizes. When this occurs, the sample application exits with the following error message:

[Error] Unable to read wav file:
../input_files/denoiser/48k/Fan_48k.wav.

Open file limit reached.

To increase this limit, before you run the sample application, use the ulimit command in the same shell to increase the number of open files. For example, ulimit -n 20000 increases the open file limit to 20,000 for that shell. For more information, refer to the documentation of your distribution on how to increase open file descriptor limits.

Run the Application#

You can run the sample application by using the run_effect.sh helper script or by directly using the effects_demo executable.

Note

This sample app processes files in an offline manner. Hence, for Voice Font and Studio Voice, the processing time for the low latency effect is expected to be equal to or higher than the processing time for the high quality effect. This is because the high quality effect processes audio in larger input audio chunks, and requires fewer processing calls to the effect when compared to the low latency effect (which processes audio in very small input chunks). For more details, refer to About the Studio Voice Effect and About the Voice Font Effect.

Use the Helper Script to Run the Sample Application#

The run_effect.sh helper script is a wrapper around the effects_demo application. Depending on the arguments passed to run_effect.sh, the script generates a temporary config file and runs the effects_demo application with this generated config file.

Run the helper script by using the following command:

./run_effect.sh -g <gpu> -e <effect> -s <input_sample_rate> -o <output_sample_rate> -b <batch_size> -i <input_file_or_folder>

For example, to run the sample application on T4 with the 16k denoiser effect with batch size of 10, run the following command:

./run_effect.sh -g t4 -s 16 -b 10 -e denoiser

This command generates a config file at /tmp/tmp_cfg.txt with the 16kHz denoiser model for T4, the sample rate initialized to 16kHz, and input/output file list based on batch size, and then runs effects_demo using this configuration file.

The helper script supports the following parameters:

  • -i/--input-file specifies the input files/folder on which to run the effect.

    If this parameter is not specified, the helper script will use the sample files that are distributed with the SDK (in the samples/input_files directory). If this parameter specifies a file/folder, the helper script will use this file/files in this folder.

    The supported value is a path to the input file in correct format (refer to Directly Running the Sample Application for more information), or a folder that contains multiple input files in correct formats. If a folder is specified, only the files at the top level will be processed. For example, if the input folder is folder1, folder1/a.wav, folder1/b.wav, and so on will be processed, but folder1/subfolder/a.wav will not be processed.

  • -g/--gpu specifies the GPU on which to run the effect. If not specified, the current GPU (device 0) is used. Supported values are a2, a16, a100, a10, t4, a30, a40, l4, l40, h100, b100, b200, and rtx_pro_6000. The helper script selects the appropriate model based on the value of this parameter. If a model is not specified, the default value is t4.

  • -e/--effect specifies which effect to use:

    • denoiser

    • dereverb

    • dereverb_denoiser

    • aec

    • superres

    • studio_voice_high_quality

    • studio_voice_low_latency

    • speaker_focus

    • voice_font_high_quality

    • voice_font_low_latency

    If an effect is not specified, the default value is denoiser.

  • -s/--sample_rate specifies the sample rate of input audio in kHz (can be either of 48/16/8). If the rate is not specified, the default value is 16.

  • -o/--output_sample_rate (Superresolution only) specifies the sample rate of output audio. If the rate is not specified, the default value is 16.

  • -b/--batch_size specifies the batch size to use.

    Depending on this parameter, the script generates an input file list (in samples/input_files) of the specified batch size by using the sample audio files provided with the SDK and a corresponding list of the output files. If the batch size is not specified, the default value is 1. The maximum value supported by this parameter is 1024.

  • -c/--cfg-file specifies the path to which the temporary configuration file will be written. If the path is not specified, the default location is /tmp/tmp_cfg.txt.

  • -f/--frame_size specifies the frame size (10 or 20) that will be used in milliseconds. If the frame size is not specified, the default value is 10.

  • -m/--effect_version (Denoiser only) specifies the version of the effect to use. Set to 1 for the legacy Denoiser model or 2 for the experimental BNR 2.0 model. For more details, refer to About the Noise Removal/Background Noise Suppression Effect.

  • -h/--help prints the parameters that are supported by this script.

Run the Sample Application Directly#

Before running the sample application, ensure that it is built (either by running ./build.sh or by calling either run_effect.sh or run_effect_chained.sh, which builds the sample app automatically). The sample app is built in the build folder.

To run the sample application after it is built, run the following command:

build/effects_demo -c <config-file>

<config-file> specifies the path of the sample config file, such as t4_denoise48k_1_cfg.txt. Sample config files for 16-kHz and 48-kHz audio are provided with the sample application.

Note

Config files that are used by the sample app can be generated by using the run_effects.sh script, which accepts a path specified by the -c or the --cfg-file flag. If this path is specified, the script writes a config file with the specified configuration parameters to that path. This config file can be reused by the effects_demo sample app.

For example, the following command writes the configuration to the file t4_aec.cfg:

./run_effect.sh -e aec -s 48 -g t4 -c t4_aec.cfg

For example, to denoise a 48-kHz stream on a T4 GPU for a batch size of 1, run:

build/effects_demo -c t4_denoise48k_1_cfg.txt

The configuration files contain pairs of parameters and their values, with one pair per line. Currently, the following parameters are supported:

reset <list-of-stream-ids>

Specifies the stream identifiers to reset, starting with 1. Multiple identifiers are separated by spaces.

effect <effect-name>

Specifies the name of the effect to apply:

  • denoiser

  • dereverb

  • dereverb_denoiser

  • aec

  • superres

  • studio_voice_high_quality

  • studio_voice_low_latency

  • speaker_focus

  • voice_font_high_quality

  • voice_font_low_latency

sample_rate <audio-sample-rate>

Specifies the sample rate of the input audio in Hz. Supported values are 8000, 16000, and 48000.

model <model-file>

Specifies the path of the model file to be used in the sample application, such as models/sm_70/denoiser_48k_1152.trtpkg. The model file should match the audio sample rate that was specified in the sample_rate parameter and the sample rate of input wav files specified in input_wav_list parameter. (For more information, refer to Set the Parameters of an Audio Effect.)

For Voice Font, both the reference and the input models must be specified in the format model <reference-model-file input-model-file>.

frame_size <frame-size-value-in-milliseconds>

Specifies the input frame size (in milliseconds) to be used in the NvAFX_Run() call.

The supported values are as follows:

  • 10 or 20 for all effects unless another value is explicitly stated.

  • 6000 for Studio Voice High Quality and 10 for Studio Voice Low Latency.

  • 800 for Voice Font High quality and 160 for Voice Font Low Latency.

input_wav_list <input-audio-file-list>

Specifies a list of paths to input audio .wav files to use. Each file should contain mono channel audio in signed 16-bit or 32-bit float format with basic WAV header. Multiple files are separated by spaces. The number of input files must match the number of streams/batch size. In a stream, the files that are separated by a semicolon (;) are processed one after another in the same stream. In addition, if the stream ID exists in the reset list, NvAFX_Reset is called on the stream identifiers when switching between files.

For example, the following configuration specifies that streams 1, 2, and 4 use file1.wav, file2.wav, and file6.wav as the input to the stream, and stream 3 uses multiple files (file3.wav, file4.wav, file5.wav) as the input to the stream:

input_wav_list file1.wav file2.wav file3.wav;file4.wav;file5.wav file6.wav

Note

Sample input audio files are included with the sample application in the samples/input_files/16k directory and the samples/input_files/48k directory.

input_farend_wav_list <input-farend-audio-file-list>

(AEC only) Specifies a list of paths to input noisy audio .wav files to be used as farend audio. Each entry in this list matches a nearend input that was specified in the input_wav_list. The number of audio samples in each input file must be the same as the number of samples in the corresponding nearend input file.

reference_wav_list <reference-audio-file-list>

(Voice Font only) Specifies a list of paths to reference audio .wav files to be used as reference audio for the Voice Font effect. Each entry in this list matches an input that was specified in the input_wav_list. The number of audio samples in each file must be more than 30 seconds (first 30 seconds will be used as reference).

output_wav_list <output-audio-file-list>

Specifies the files to which the output audio will be written. Output files contain mono audio in 32-bit float format. Multiple files are separated by spaces. In a stream, if multiple input files are specified (separated using semicolon), multiple output files will be created with the same name followed by _1, _2, and so on.

For example, in the following configuration, the output will be written to out1.wav (output of file1.wav), out2.wav (output of file2.wav), out3.wav (output of file3.wav), out3_1.wav (output of file4.wav), out3_2.wav (output of file5.wav), and out4.wav (output of file6.wav):

input_wav_list file1.wav file2.wav file3.wav;file4.wav;file5.wav file6.wav

output_wav_list out1.wav out2.wav out3.wav out4.wav

Note

In input/output .wav files, only the basic WAV header is supported.

real_time <enable>

Simulates real-time audio input, set to 1 to enable or 0 to disable (disabled by default). When this option is enabled, each audio frame is passed to the SDK with a delay, similar to how audio is received from a physical device or stream. For example, if the frame size is 10ms, each frame is passed in every 10ms, like how audio is received from a physical microphone (10 ms audio received from the microphone approximately every 10 ms).

intensity_ratio <[ratio|list_of_ratios]>

Specifies the denoising intensity ratio. This value can either be a single value (applied to all streams) or a value per stream (individual values applied to each corresponding stream).

The value of this parameter ranges from 0.0 to 1.0 (inclusive), where a higher value indicates a stronger suppression of noise/reverb. A value of 0.0 is equivalent to passing out input audio without applying noise removal/dereverb.

effect_version <effect_version>

(Denoiser only) Specifies the version for the effect. Set to 1 to use the original BNR effect and 2 to use the experimental BNR 2.0 effect. For more details, see About the Noise Removal/Background Noise Suppression Effect.

Chain Multiple Effects#

This sample application also supports chaining multiple effects. (For more information, refer to Run Multiple Audio Effects in a Chain.)

To run the application in chaining mode, use run_effect_chained.sh:

./run_effect_chained.sh -g <gpu> -e1 <effect1> -s1 <input_sample_rate_1> -o1 <output_sample_rate_1> -e2 <effect2> -s2 *input_sample_rate_2 -o2 output_sample_rate_2* [-c *path_to_save_config_file*] [-i input_file_or_folder]

The preceding script above generates a config file that can be used with the effects_demo sample to run multiple effects in a chain and runs the application with this file.

The config file that is used for chaining follows the same format and parameters as effects_demo, with the following modifications:

effect <effect-name-1> <effect-name-2>

Specifies the names of the effects to apply to input audio (effect-name-1 will be applied to input audio first, and effect-name-2 will be applied to this output). For more information about chaining combinations, see Create a Chained Audio Effect.

Note

Chaining effects only support combinations of Superres+Denoiser/Dereverb/Combined Denoiser+Dereverb effect and Denoiser + Speaker Focus. Other effect chains are not supported.

If you combine the Denoiser effect and Dereverb effect, use the combined Denoiser+Dereverb model. For more information, see About the Noise Removal/Background Noise Suppression Effect.

sample_rate <audio-sample-rate-1> <audio-sample-rate-2>

Specifies the input sample rate of the audio in Hz for the effects. The supported values are 8000, 16000, and 48000.

model <model-file-1> <model-file-2>

Specifies the path of the model file to be used by the effects, for example, models/sm_70/denoiser_48k_1152.trtpkg. The model file should match the audio sample rate that was specified in the sample_rate parameter and the sample rate of input wav files specified in input_wav_list parameter. (For more information, refer to Set the Parameters of an Audio Effect.)

intensity_ratio <ratio-1> <ratio-2>

Specifies the denoising intensity ratio of the first and the second effect. The value of this parameter ranges from 0.0 to 1.0 (inclusive), where a higher value indicates a stronger suppression of noise/reverb. A value of 0.0 is equivalent to passing out input audio without applying noise removal/dereverb.

chained_effect_gpu_list <gpu-1> <gpu-2>

In a multi-GPU system, specifies the GPU device ID that will be used for the first and the second effect in the chain.

The supported value for this parameter is a path to the input file in the correct format (see Directly Running the Sample Application) or a folder that contains multiple input files in the correct format. If a folder is specified, only the files at the top level are processed. For example, if the input folder is folder1, the folders folder1/a.wav, folder1/b.wav, and so on are processed, but folder1/subfolder/a.wav is not processed.