The C++ Basic ASR Sample
The C++ ASR sample, nvigi.basic.asr.cxx, demonstrates automatic speech recognition (ASR) using the NVIGI SDK with modern C++ interfaces. It showcases both complete audio transcription and real-time streaming modes, allowing you to record audio from a microphone and convert speech to text.
NOTE: This sample supports multiple backends (D3D12, CUDA, and Vulkan), allowing it to run on various hardware configurations.
Features
Complete Audio Mode: Record a fixed duration (5 seconds) and transcribe the complete audio clip
Streaming Mode: Real-time continuous audio streaming with incremental transcription
Multiple Backend Support: D3D12, CUDA, or Vulkan backends
Language Support: Auto-detection or specific language selection (en, es, fr, etc.)
Translation: Automatic translation to English if desired
WAV File Export: Save recorded audio to WAV files for debugging
Download Required Models
The ASR sample requires the following model:
Plugin |
Model Name |
GUID |
|---|---|---|
nvigi.plugin.asr.ggml.* |
Whisper Small |
5CAD3A03-1272-4D43-9F3D-655417526170 |
See the top-level documentation that shipped with your development pack for information on how to download these models.
Building the Sample
The sample is built as part of the SDK build process. After building, copy the binaries using:
copy_sdk_binaries.bat <cfg>
This ensures all DLLs and the executable are in the same directory (bin\x64).
How to Use the ASR Sample
Complete Audio Mode (Default)
In this mode, you record a fixed duration (5 seconds) of audio and then transcribe it as a complete clip.
Open a command prompt in
<SDK_ROOT>Run the command:
bin\x64\nvigi.basic.asr.cxx.exe --models <SDK_MODELS> --sdk bin\x64
In a standard layout binary development pack or GitHub source tree:
bin\x64\nvigi.basic.asr.cxx.exe --models data/nvigi.models --sdk bin\x64
At the prompt, type
recordto start a 5-second recordingSpeak into your microphone during the recording
The transcribed text will be displayed after recording completes
Type
quitorexitto exit the application
Streaming Mode
Streaming mode provides real-time continuous audio streaming with incremental transcription.
Run the sample with the
--streamingflag:
bin\x64\nvigi.basic.asr.cxx.exe --models data/nvigi.models --sdk bin\x64 --streaming
At the prompt, type
streamto start real-time streamingSpeak into your microphone - you’ll see transcription appear in real-time
Press Enter to stop streaming
Type
quitorexitto exit the application
Command Line Options
Usage: nvigi.basic.asr.cxx [options]
--sdk sdk location, if none provided assuming exe location
--plugin plugin location, if none provided assuming sdk location
-m, --models model repo location (REQUIRED)
-t, --threads number of threads (default: 8)
--fa, --flash-attention use flash attention
--backend backend to use - d3d12, cuda, vulkan (default: d3d12)
--guid ASR model guid in registry format (default: {5CAD3A03-1272-4D43-9F3D-655417526170})
--vram the amount of vram to use in MB (default: 2048)
--log-level logging level 0-2 (default: 0)
--language language code (en, es, fr, auto, etc.) (default: en)
--detect-lang auto-detect language
--translate translate to English
--streaming use streaming mode (experimental)
--save-wav save audio to file (default: c:/test.wav)
--print-system-info print system information
Examples
Basic transcription with English language:
bin\x64\nvigi.basic.asr.cxx.exe --models data/nvigi.models --sdk bin\x64 --language en
Auto-detect language and translate to English:
bin\x64\nvigi.basic.asr.cxx.exe --models data/nvigi.models --sdk bin\x64 --detect-lang --translate
Use Vulkan backend with streaming mode:
bin\x64\nvigi.basic.asr.cxx.exe --models data/nvigi.models --sdk bin\x64 --backend vulkan --streaming
Save recorded audio to a WAV file:
bin\x64\nvigi.basic.asr.cxx.exe --models data/nvigi.models --sdk bin\x64 --save-wav recording.wav
Audio Format
The sample uses the following audio format:
Sample Rate: 16000 Hz (16 kHz)
Bit Depth: 16-bit PCM
Channels: Mono
Input: Windows DirectX default recording device
Troubleshooting
“Failed to start recording”
Check that a microphone is connected and enabled in Windows sound settings
Verify the default recording device is set correctly in Windows
No audio recorded
Ensure the microphone is not muted
Check microphone permissions in Windows privacy settings
Try speaking louder or moving closer to the microphone
Out of memory errors
Try reducing the
--vramparameter valueClose other GPU-intensive applications
Programming Notes
This sample demonstrates:
Modern C++ wrapper interfaces (
nvigi::asr::Instance)Blocking transcription with callbacks
Non-blocking async operations with polling
Real-time audio streaming with incremental results
Builder pattern for configuration
RAII-based resource management
std::expectedfor error handling
Run in Debugger
To run nvigi.basic.asr.cxx in the debugger:
One-time setup in the project file:
In the MSVC IDE, edit the project config settings for
nvigi/samples/nvigi.basic.cxx/asrNavigate to the “Debugging” settings
Set “Command” to
<SDK_ROOT>\bin\x64\nvigi.basic.asr.cxx.exeSet “Command Arguments” as needed (see command line options above)
Set “Working Directory” to
<SDK_ROOT>/bin/x64
Build the desired non-production config; Release is recommended
After each (re-)build, re-run
copy_sdk_binaries.bat <cfg>The sample can now be run in the debugger