Set the Parameters of an Audio Effect#

An audio effect requires a model to transform the input audio. Each model supports a specific audio sample rate. The path to the model file and input audio sample rate (Linux SDK only) must be set in the SDK. After required parameters for the effect are set, the effect can be loaded using NvAFX_Load.

The SDK also supports several frame sizes (the number of samples per frame), which can be queried and set in the SDK. (For more information, see Get the Parameters of an Audio Effect.)

To set U32 values, call the NvAFX_SetU32() function with the following parameters:

Previously created effect handle.
The selector string for the parameter to be set:
- To set the number of samples per input frame, specify NVAFX_PARAM_NUM_SAMPLES_PER_INPUT_FRAME.
- In a Multi-GPU setup, to have SDK automatically select the GPU compatible with the model set in SDK, set the NVAFX_PARAM_USE_DEFAULT_GPU parameter to 1 (The default value is 0). This parameter is not supported by chained effects.
- The Noise Removal and Room Echo Removal/Room Echo Cancellation effect support VAD, which can indicate whether the audio data frame supplied to the SDK through NvAFX_Run contains a voice.
  
  When NVAFX_PARAM_ENABLE_VAD is enabled, this feature also removes low-volume noise and all non-speech data from the NvAFX_Run output without degrading performance.
  
  To enable this feature, set the following parameter to 1. (The default value is 1 for BNR 2.0 and 0 for all other effects.) This parameter is not supported by chained effects.
- To set the sample rate (not supported by chained effects), specify NVAFX_PARAM_INPUT_SAMPLE_RATE. This value must be set before querying the frame sizes supported the effect using NvAFX_GetU32 or NvAFX_GetU32List.
- (Denoiser v2 only) To set the version of the denoiser effect, specify NVAFX_PARAM_EFFECT_VERSION.
- Linux only
  - To set the number of audio streams, specify NVAFX_PARAM_NUM_STREAMS.
  - In a multi-GPU environment, to run the constituent effects of a chained effect on separate GPUs, specify NVAFX_PARAM_CHAINED_EFFECT_GPU_LIST.
- Windows only
  - To set the output sample rate (not supported by chained effects), specify NVAFX_PARAM_OUTPUT_SAMPLE_RATE.
  - To create and manage the CUDA context used by the SDK, set the NVAFX_PARAM_USER_CUDA_CONTEXT parameter to 1 (the default value is 0).
  - To disable CUDA graphs, set the NVAFX_PARAM_DISABLE_CUDA_GRAPH parameter to 1 (the default value is 1).
An unsigned integer value that specifies the value for the selector.

To set the model, call the NvAFX_SetString() function with the following parameters:

Previously created effect handle.
A null-terminated string specifying the path to the model file.
- For Linux:
  - Each model file supports a specific sample rate and a maximum number of audio streams.
  - Model files for specific GPU compute versions are located in the models/<compute_version> directory in the SDK.
    
    The following GPU compute versions can be used for the following GPUs:
    - Turing (T4): models/sm_75
    - Ampere
      - A100 (ga100 based GPUs): models/sm_80
      - A10 (ga102 or later GPUs): models/sm_86
    - ADA (L4/L40): models/sm_89
    - Hopper (H100): models/sm_90
    - Blackwell
      - B100/B200: models/sm_100
      - RTX PRO 6000: models/sm_120
  - The specified model should match the sample rate and a specified number of audio streams.
  - The model file name uses the following format: <effect> <samplerate> <max-streams>.trtpkg
    - For convenience, each folder includes a symlink, for example denoise_16k.trtpkg and denoiser_48k.trtpkg, which points to the actual model.
    - samplerate can be 8k, 16k or 48k.
    - Number of audio streams should be within the range 1 and max-streams (both inclusive).
    - The model gives the best throughput performance when the number of audio streams is set to 64 or a multiple of 256 (256, 512, 768, and so on).
    For example, the denoiser_48k_1152.trtpkg model can be used for 48 kHz and between 1 to 1152 audio streams but will be optimal for 64, 256, 512, 768, and 1024 streams. Code that uses this model can also directly use the symlink denoiser_48k.trtpkg in the same folder, which allows the underlying model to be changed without code changes.
  - The Voice Font effect has two models:
    - Reference model: Used by reference audio to extract speech features. This model is common for both the High Quality and Low Latency effects.
    - Inference model (or input model): Used for actual inference.
- For Windows:
  - Each model file supports a specific sample rate.
  - Model files for specific GPU compute versions are located in the models directory in the SDK.
  - The Voice Font effect has 2 models:
    - Reference model: Used by reference audio to extract speech features.
    - Inference model (or input model): Used for actual inference.
For chained effects, call the NvAFX_SetStringList function with the following parameters:
- The previously created effect handle.
- An array of null-terminated strings, each specifying the path to the model file of the effect to be chained.
  
  For example, for a Denoiser 16k + Superres 16k to 48k chain, an array that contains two paths should be passed in the following paths:
  - To the 16k Denoiser model.
  - To the 16k to 48k Superres model. The model paths should follow the same conventions as the conventions of the standalone effect.
- The length of the array.

For example, the following code sets the sample rate to sample_rate and the path to the model specified by the model_file.c_str():

NvAFX_Status err;

// Set sample rate (Linux only)
err = NvAFX_SetU32(handle, NVAFX_PARAM_INPUT_SAMPLE_RATE, sample_rate);

// Set model path
err = NvAFX_SetString(handle, NVAFX_PARAM_MODEL_PATH, model_file.c_str());
err = NvAFX_SetU32(handle, NVAFX_PARAM_NUM_STREAMS, num_streams);