About the Studio Voice Effect#

Speech recorded with low-end microphones in less-than-ideal acoustic environments can contain distortions such as reverberations or static noise. The Studio Voice effect can enhance and recover degraded speech recorded using low-end microphones in non-ideal acoustic environments (for example, with reverb and static). This effect also recovers speech degraded by noise-reduction filters or beamforming algorithms. The resulting audio sounds better than the original because of the removal or reduction of degradation and artifacts, making the audio sound more like a recording in a professional studio setup.

This effect has two variations:

Studio Voice High Quality
Studio Voice Low Latency

Studio Voice High Quality#

This effect is intended for offline use cases, such as post-processing video and audio files.

This effect supports the following configurations:

16-kHz degraded speech to 16-kHz enhanced speech.
48-kHz degraded speech to 48-kHz enhanced speech.

To run the sample application on Windows for this effect, use the following command:

# (One time, initial setup): Download models using models/download_models.ps1
powershell -ExecutionPolicy Bypass -File ./download_models.ps1 --gpu_architecture <gpu> --effects studio_voice-16k,studio_voice-48k

# Format: run_effects_demo.bat -g ^<architecture^> -e ^<effect^> -isr ^<input_sr^> -osr ^<output_sr^> -ir ^<intensity_ratio^> -ev ^<effect_version^> -vad ^<enable_vad^>

# 16k effect on turing GPU
run_effects_demo.bat -g turing -e studio_voice_high_quality -isr 16k -osr 16k

# 48k effect on ampere GPU
run_effects_demo.bat -g ampere -e studio_voice_high_quality -isr 48k -osr 48k

Note

For more information, see Use the Helper Script to Run the Sample Application.

To run the sample application on Linux for this effect, use the following command:

# (One time, initial setup): Download models using models/download_models.sh
./download_models.sh --gpu <gpu> --effects studio_voice-16k,studio_voice-48k

# Refer to Section 3.2 for further details
Format: ./run_effect.sh -g <gpu> -s <sample_rate> -e studio_voice_high_quality

# 16k effect
./run_effect.sh -g t4 -s 16 -e studio_voice_high_quality

# 48k effect
./run_effect.sh -g t4 -s 48 -e studio_voice_high_quality

Note

For more information, see Use the Helper Script to Run the Sample Application.

Note

This effect is not optimized for real-time use cases or for optimal GPU usage. Outputs might have extra latency (up to 6 seconds) and, depending on your system specifications, are not guaranteed to be generated in real time.

This effect uses large input/output chunk sizes (6 seconds per inference, which can be queried using NvAFX_GetU32) and currently does not support multiple batches or delayed streams.

Studio Voice Low Latency#

This effect is intended for online or real-time use cases (for example, online voice conferences or broadcast apps).

This effect supports the following configuration:

48-kHz degraded speech to 48-kHz enhanced speech.

Note

This effect is optimized specifically for real-time use cases, where input audio is provided in small chunks and output audio is expected back with minimal delay.

This effect should not be used for offline use cases (such as post-processing) because this effect processes audio in very small input chunks. Use the high-quality effect for such cases, because it has a larger input audio chunk size, inevitably making it faster than the low-latency effect (because it operates on larger chunks of the input per processing call and thus requires fewer processing calls).

To run the sample application on Windows for this effect, use the following command:

# (One time, initial setup): Download models using models/download_models.ps1
powershell -ExecutionPolicy Bypass -File ./download_models.ps1 --gpu_architecture <gpu> --effects studio_voice-48k

# Format: run_effects_demo.bat -g ^<architecture^> -e ^<effect^> -isr ^<input_sr^> -osr ^<output_sr^> -ir ^<intensity_ratio^> -ev ^<effect_version^> -vad ^<enable_vad^>

# 16k effect on turing GPU
run_effects_demo.bat -g turing -e studio_voice_low_latency -isr 16k -osr 16k

# 48k effect on ampere GPU
run_effects_demo.bat -g ampere -e studio_voice_low_latency -isr 48k -osr 48k

Note

For more information, see Use the Helper Script to Run the Sample Application.

To run the sample application on Linux for this effect, use the following command:

# (One time, initial setup): Download models using models/download_models.sh
./download_models.sh --gpu <gpu> --effects studio_voice-16k,studio_voice-48k

# Refer to Section 3.2 for further details
Format: ./run_effect.sh -g <gpu> -s <sample_rate> -e studio_voice_low_latency

# 16k effect
./run_effect.sh -g t4 -s 16 -e studio_voice_low_latency

# 48k effect
./run_effect.sh -g t4 -s 48 -e studio_voice_low_latency

Note

For more information, see Use the Helper Script to Run the Sample Application.

Note

This effect is not optimized for optimal GPU usage. Outputs might have extra latency (up to 110 ms) and might not be generated in real time on lower-end GPUs. Also, this effect does not currently support multiple batches or delayed streams.