Is this page helpful?

Tutorials for NVIDIA Speech NIM Microservices#

These tutorials guide you through deploying and using NVIDIA Speech NIM microservices for the first time. By the end, you will understand the NIM deployment workflow, know how to interact with each service through its APIs, and be ready to integrate speech AI into your own applications.

Each tutorial follows the same pattern: deploy a NIM container, verify it is running, and send requests. This consistency is intentional. After you complete one tutorial, the others will feel familiar because every Speech NIM shares the same operational model.

Before You Begin#

Complete these setup steps before starting any tutorial:

Prerequisites: Hardware, drivers, and software requirements.
NGC Access Setup: Get an NGC API key and log in to the container registry.

Recommended Learning Path#

Start with the ASR tutorial, then progress through TTS and NMT. Each tutorial builds familiarity with the shared NIM workflow while introducing service-specific concepts.

Order	Tutorial	You will learn	Time
1	ASR: Speech-to-Text	How NIM containers deploy models, the difference between streaming and offline transcription, and how to use gRPC and HTTP clients.	30-45 min
2	TTS: Text-to-Speech	How to select voices and languages, the difference between offline and streaming synthesis, and when to use each API.	30-45 min
3	NMT: Neural Machine Translation	How to translate between languages, control translation output with exclusion tags and custom dictionaries, and handle batched requests.	20-30 min

Supplementary#

Running on WSL2: Learn how to run Speech NIM microservices on Windows with WSL2, including port forwarding for remote clients.
Stopping the Microservice: Learn how to stop and remove the microservices.