The C++ Basic TTS Sample

The C++ TTS sample, nvigi.basic.tts.cxx, demonstrates text-to-speech synthesis using the NVIGI SDK with modern C++ interfaces. It showcases voice cloning, real-time audio playback, and both synchronous and asynchronous generation modes with support for multiple backends.

NOTE: This sample supports multiple backends (D3D12, CUDA, and Vulkan), allowing it to run on various hardware configurations.

Features

Voice Cloning: Clone any voice using a target spectrogram
Real-Time Playback: Play generated audio in real-time using DirectSound (Windows)
WAV File Output: Save generated speech to WAV files
Async/Polling API: Non-blocking operations perfect for game loops
Multiple Backends: D3D12, CUDA, or Vulkan backends
Language Support: Multiple languages (en, en-us, en-uk, es, de)
Speed Control: Adjust speech speed from 0.5x to 2.0x
Quality Control: Configure timesteps for quality/speed tradeoff

Download Required Models

The TTS sample requires the following model:

Plugin	Model Name	GUID
nvigi.plugin.tts.asqflow-trt	Riva Magpie-TTS-Flow	81320D1D-DF3C-4CFC-B9FA-4D3FF95FC35F
nvigi.plugin.tts.asqflow-ggml.*	Riva Magpie-TTS-Flow (GGML)	16EEB8EA-55A8-4F40-BECE-CE995AF44101

Important: You also need a target voice spectrogram file. The SDK test data includes sample spectrograms in <SDK_TEST>/nvigi.tts/asqflow/mel_spectrograms_targets/.

See the top-level documentation that shipped with your development pack for information on how to download these models and test data.

Building the Sample

The sample is built as part of the SDK build process. After building, copy the binaries using:

copy_sdk_binaries.bat <cfg>

This ensures all DLLs and the executable are in the same directory (bin\x64).

How to Use the TTS Sample

Basic Speech Generation

Open a command prompt in <SDK_ROOT>
Run the command:

bin\x64\nvigi.basic.tts.cxx.exe --models <SDK_MODELS> --sdk bin\x64 --target <SDK_TEST>/nvigi.tts/asqflow/mel_spectrograms_targets/03_M-Tom_Sawyer_15s_se.bin

In a standard layout binary development pack or GitHub source tree:

bin\x64\nvigi.basic.tts.cxx.exe --models data/nvigi.models --sdk bin\x64 --target data/nvigi.test/nvigi.tts/asqflow/mel_spectrograms_targets/03_M-Tom_Sawyer_15s_se.bin

The sample will generate speech for the default text and save it to output.wav
You can play the generated WAV file to hear the synthesized speech

Real-Time Playback Mode

To hear the generated speech in real-time as it’s being generated:

bin\x64\nvigi.basic.tts.cxx.exe --models data/nvigi.models --sdk bin\x64 --target data/nvigi.test/nvigi.tts/asqflow/mel_spectrograms_targets/03_M-Tom_Sawyer_15s_se.bin --play

Async Mode (Non-Blocking)

For game integration or when you need to continue other processing:

bin\x64\nvigi.basic.tts.cxx.exe --models data/nvigi.models --sdk bin\x64 --target data/nvigi.test/nvigi.tts/asqflow/mel_spectrograms_targets/03_M-Tom_Sawyer_15s_se.bin --async --play

Command Line Options

Usage: nvigi.basic.tts.cxx [options]

  --sdk                     sdk location, if none provided assuming exe location
  --plugin                  plugin location, if none provided assuming sdk location
  -m, --models              model repo location (REQUIRED)
  -t, --threads             number of threads (default: 8)
  --backend                 backend to use - d3d12, cuda, vulkan (default: d3d12)
  --guid                    TTS model guid in registry format (default: {16EEB8EA-55A8-4F40-BECE-CE995AF44101})
  --vram                    the amount of vram to use in MB (default: 2048)
  --log-level               logging level 0-2 (default: 0)
  --text                    text to synthesize (default: "Hello! This is a test of the text to speech system.")
  --target                  path to target voice spectrogram (REQUIRED)
  --output                  output WAV file path (default: output.wav)
  --speed                   speech speed (0.5 - 2.0) (default: 1.0)
  --language                language code (en, en-us, en-uk, es, de) (default: en)
  --timesteps               number of timesteps for TTS inference (16-32) (default: 16)
  --async                   use async mode (polled, non-blocking)
  --play                    play audio in real-time using DirectSound
  --print-system-info       print system information

Examples

Generate speech with custom text:

bin\x64\nvigi.basic.tts.cxx.exe --models data/nvigi.models --sdk bin\x64 --target data/nvigi.test/nvigi.tts/asqflow/mel_spectrograms_targets/03_M-Tom_Sawyer_15s_se.bin --text "Welcome to the NVIGI SDK text to speech system."

Use a different voice:

bin\x64\nvigi.basic.tts.cxx.exe --models data/nvigi.models --sdk bin\x64 --target data/nvigi.test/nvigi.tts/asqflow/mel_spectrograms_targets/01_F-Jennifer_20s_se.bin --text "Hello world!"

Adjust speech speed (slower):

bin\x64\nvigi.basic.tts.cxx.exe --models data/nvigi.models --sdk bin\x64 --target data/nvigi.test/nvigi.tts/asqflow/mel_spectrograms_targets/03_M-Tom_Sawyer_15s_se.bin --speed 0.75

Adjust speech speed (faster):

bin\x64\nvigi.basic.tts.cxx.exe --models data/nvigi.models --sdk bin\x64 --target data/nvigi.test/nvigi.tts/asqflow/mel_spectrograms_targets/03_M-Tom_Sawyer_15s_se.bin --speed 1.5

Higher quality (more timesteps, slower):

bin\x64\nvigi.basic.tts.cxx.exe --models data/nvigi.models --sdk bin\x64 --target data/nvigi.test/nvigi.tts/asqflow/mel_spectrograms_targets/03_M-Tom_Sawyer_15s_se.bin --timesteps 32

Lower quality (fewer timesteps, faster):

bin\x64\nvigi.basic.tts.cxx.exe --models data/nvigi.models --sdk bin\x64 --target data/nvigi.test/nvigi.tts/asqflow/mel_spectrograms_targets/03_M-Tom_Sawyer_15s_se.bin --timesteps 16

Use Vulkan backend:

bin\x64\nvigi.basic.tts.cxx.exe --models data/nvigi.models --sdk bin\x64 --target data/nvigi.test/nvigi.tts/asqflow/mel_spectrograms_targets/03_M-Tom_Sawyer_15s_se.bin --backend vulkan

Spanish language:

bin\x64\nvigi.basic.tts.cxx.exe --models data/nvigi.models --sdk bin\x64 --target data/nvigi.test/nvigi.tts/asqflow/mel_spectrograms_targets/03_M-Tom_Sawyer_15s_se.bin --language es --text "Hola mundo"

Save to custom output file:

bin\x64\nvigi.basic.tts.cxx.exe --models data/nvigi.models --sdk bin\x64 --target data/nvigi.test/nvigi.tts/asqflow/mel_spectrograms_targets/03_M-Tom_Sawyer_15s_se.bin --output my_speech.wav

Audio Format

The generated audio uses the following format:

Sample Rate: 22050 Hz (22.05 kHz)
Bit Depth: 16-bit PCM
Channels: Mono

API Patterns Demonstrated

Synchronous (Blocking) Generation

The sample demonstrates a simple blocking pattern where generation blocks until complete:

instance->generate(
    text,
    target_path,
    config,
    [&wav_writer](const int16_t* audio, size_t samples, ExecutionState state) -> ExecutionState {
        if (state == ExecutionState::DataPending || state == ExecutionState::Done) {
            // Write audio chunk to file
            wav_writer.write_samples(audio, samples);
            
            // Optionally play in real-time
            AudioPlayer::play_audio(audio, samples);
        }
        return state;  // Continue
    }
);

Asynchronous (Non-Blocking) Generation

The sample also demonstrates a polling-based async pattern perfect for game loops:

auto op = instance->generate_async(text, target_path, config);

// Game loop
while (!op.is_complete()) {
    // Try to get results (non-blocking)
    if (auto result = op.try_get_results()) {
        if (!result->audio.empty()) {
            // Write audio chunk to file
            wav_writer.write_samples(result->audio.data(), result->audio.size());
            
            // Play in real-time
            AudioPlayer::play_audio(result->audio.data(), result->audio.size());
            
            if (result->state == ExecutionState::Done) {
                break;
            }
        }
    }
    
    // Continue game logic
    render_frame();
    update_physics();
    process_input();
    
    // Small sleep to avoid busy-wait
    std::this_thread::sleep_for(std::chrono::milliseconds(10));
}

Voice Cloning

Voice cloning is achieved using target spectrogram files. These files contain the acoustic characteristics of the voice you want to clone. The SDK test data includes several sample voices:

01_F-Jennifer_20s_se.bin - Female voice (Jennifer)
03_M-Tom_Sawyer_15s_se.bin - Male voice (Tom Sawyer)
Additional samples in <SDK_TEST>/nvigi.tts/asqflow/mel_spectrograms_targets/

To use a different voice, simply change the --target parameter to point to a different spectrogram file.

Quality vs Speed Tradeoff

The --timesteps parameter controls the quality/speed tradeoff:

16 timesteps: Faster generation, slightly lower quality (default)
24 timesteps: Balanced quality and speed
32 timesteps: Highest quality, slower generation

For real-time applications, 16-24 timesteps is recommended. For offline generation where quality is paramount, use 32 timesteps.

Troubleshooting

“Failed to create TTS instance”

Verify the model GUID exists in your models directory
Check that you have sufficient VRAM (try reducing --vram)
Ensure the backend is properly installed (e.g., D3D12 requires Windows 10+)

Missing target spectrogram file

Verify the path to the spectrogram file is correct
Check that the test data has been downloaded
Use an absolute path if relative paths aren’t working

No audio playback with `--play`

Real-time playback only works on Windows with DirectSound
Check that your audio output device is working
Try without --play and verify the WAV file is generated correctly

Out of memory errors

Reduce VRAM budget: --vram 1024
Close other GPU-intensive applications
Try a smaller text input

Poor audio quality

Increase timesteps: --timesteps 32
Check the quality of the target spectrogram file
Verify the language parameter matches your text

Programming Notes

This sample demonstrates:

Modern C++ wrapper interfaces (nvigi::tts::Instance)
Blocking generation with streaming callbacks
Non-blocking async operations with polling (perfect for games)
Real-time audio playback using DirectSound
WAV file writing with proper headers
Builder pattern for runtime configuration
RAII-based resource management
std::expected for error handling

Run in Debugger

To run nvigi.basic.tts.cxx in the debugger:

One-time setup in the project file:
1. In the MSVC IDE, edit the project config settings for nvigi/samples/nvigi.basic.cxx/tts
2. Navigate to the “Debugging” settings
3. Set “Command” to <SDK_ROOT>\bin\x64\nvigi.basic.tts.cxx.exe
4. Set “Command Arguments” as needed (see command line options above)
5. Set “Working Directory” to <SDK_ROOT>/bin/x64
Build the desired non-production config; Release is recommended
After each (re-)build, re-run copy_sdk_binaries.bat <cfg>
The sample can now be run in the debugger