Nemotron Speech Agent Skills#
NVIDIA Nemotron Speech provides a set of curated agentic skills for AI coding assistants that work with NVIDIA Speech NIM microservices. The skills help your AI coding agents choose models, prepare environments, deploy containers, run inference, customize ASR pipelines, and troubleshoot ASR, TTS, and NMT workflows.
The skill repository is available on the Nemotron Speech Skills GitHub repository. Clone the repository using the following command:
git clone https://github.com/nvidia-riva/Nemotron-speech-skills.git
The following table lists the key repository files and directories:
Path |
Description |
|---|---|
|
Agentic skill package for NVIDIA Nemotron Speech workflows. |
|
Skill definition, trigger phrases, routing rules, guardrails, and links to the bundled references. |
|
Condensed workflow references for setup, readiness checks, model selection, ASR, custom ASR, pipeline tuning, TTS, and NMT. |
|
Skill metadata, use case, output type, and evaluation summary. |
|
Evaluation tasks for activation, routing, and safety behavior. |
|
Claude Code plugin manifest. |
|
Codex plugin manifest. |
|
Marketplace metadata for agents that support the agentskills.io layout. |
Prerequisites#
To use the skill with an AI coding assistant:
An AI coding assistant with skill, plugin, marketplace, or project-rule support, such as Claude Code, Codex, Cursor, Windsurf, or another agentskills.io-compatible tool.
Access to the
Nemotron-speech-skillsrepository.A GPU, Speech NIM container, and NGC key are not required to install the skill or ask planning questions.
To run commands or code generated with the skill:
NVIDIA AI Enterprise entitlement for self-hosted Speech NIM containers.
A supported NVIDIA GPU, NVIDIA driver, Docker, and NVIDIA Container Toolkit for local deployments.
An
NGC_API_KEYfor pulling self-hosted containers fromnvcr.io, or anNVIDIA_API_KEYfor cloud-hostedbuild.nvidia.cominference.The
nvidia-riva-clientPython package for gRPC client examples.A process to review, test, and security-validate generated commands before production use.
Agentic Skill#
An agentic skill is a structured knowledge package that an AI coding assistant can discover and activate for a specific domain. It contains task routing rules, reference material, and safety guidance so the assistant can use the correct workflow without the developer pasting the same context into every conversation.
The skills/nemotron-speech/ directory contains one umbrella skill, nemotron-speech. The top-level SKILL.md file is a routing surface. It instructs the assistant to load only the reference file needed for the current task so context stays focused.
About the Skill#
The nemotron-speech skill covers the following workflow areas:
Reference |
Use It For |
|---|---|
|
Driver, Docker, NVIDIA Container Toolkit, NGC access, and Riva Python client setup. |
|
GPU compatibility, container health checks, and deployment troubleshooting. |
|
Choosing ASR, TTS, or NMT model families based on use case, language, latency, and deployment target. |
|
Deploying and running ASR NIMs, including Parakeet, Canary, Whisper, and Nemotron ASR Streaming. |
|
Deploying custom NeMo-trained ASR models with |
|
Configuring ASR pipeline options such as VAD, diarization, language models, and chunk size. |
|
Deploying and running TTS NIMs, including Magpie TTS workflows. |
|
Deploying and running NMT workflows, including language pairs and DNT tags. |
Current model names, container IDs, function IDs, voice lists, language pairs, and hardware requirements can change between releases. The skill directs the assistant to verify current details against NVIDIA documentation or build.nvidia.com before recommending release-specific values.
Installing the Skill#
Use the installation method supported by your AI coding assistant.
Claude Code#
Install from the repository marketplace in a Claude Code session:
/plugin marketplace add https://github.com/nvidia-riva/Nemotron-speech-skills.git
/plugin install nemotron-speech@nemotron-speech-skills
Codex#
Install the plugin directly from GitHub:
codex plugin add https://github.com/nvidia-riva/Nemotron-speech-skills.git
Alternatively, add the marketplace to your Codex configuration:
[[marketplaces]]
name = "nemotron-speech-skills"
url = "https://github.com/nvidia-riva/Nemotron-speech-skills.git"
Cursor#
Clone the repository and reference the skill from your project rules:
git clone https://github.com/nvidia-riva/Nemotron-speech-skills.git ~/agent-skills/Nemotron-speech-skills
mkdir -p .cursor/rules
ln -s ~/agent-skills/Nemotron-speech-skills/skills/nemotron-speech .cursor/rules/nemotron-speech
You can also add the skills/nemotron-speech/SKILL.md path through Cursor settings.
Windsurf#
Clone the repository:
git clone https://github.com/nvidia-riva/Nemotron-speech-skills.git ~/agent-skills/Nemotron-speech-skills
Then reference the skill from your project’s .windsurfrules file:
@include ~/agent-skills/Nemotron-speech-skills/skills/nemotron-speech/SKILL.md
Other Compatible Tools#
For assistants that automatically discover skills under a skills/ directory, clone the repository into a visible location for the assistant:
git clone https://github.com/nvidia-riva/Nemotron-speech-skills.git
The repository root contains the skills/nemotron-speech/ package and plugin metadata needed by compatible tools.
Verifying the Installation#
After installation, open or restart your AI coding assistant and ask a Speech NIM question such as:
Help me choose a Riva ASR model for low-latency English transcription.
The assistant should route to the nemotron-speech skill, consult the model selection reference, and verify current model details from NVIDIA documentation or build.nvidia.com before giving release-specific guidance.
Updating the Skill#
If you installed from a plugin marketplace, use your assistant’s plugin manager to update or reinstall the plugin when a new repository version is available.
If you installed from a local clone or symlink, update the clone:
cd ~/agent-skills/Nemotron-speech-skills
git pull
If you copied the skill directory into an assistant-specific skills folder, pull the latest repository changes and copy the directory again. Re-copying overwrites local edits, so preserve any intentional customizations before replacing files.
Best Practices for AI-Assisted Speech NIM Workflows#
Use precise prompts. Include the target task, such as ASR, TTS, NMT, setup, model selection, custom ASR deployment, or pipeline tuning.
Specify the deployment target. State whether you want cloud-hosted build.nvidia.com inference, a self-hosted Docker container, or guidance for both paths.
Describe constraints. Include language, latency, streaming or offline mode, diarization, word timestamps, voice, translation direction, custom model artifacts, privacy needs, and hardware constraints.
Ask the assistant to verify current product details. Model names, CONTAINER_ID values, NIM_TAGS_SELECTOR profiles, NVCF function IDs, voice names, language pairs, and VRAM requirements should come from current NVIDIA sources.
Protect credentials. Keep NVIDIA_API_KEY and NGC_API_KEY in environment variables or secret managers. Do not paste real keys into chat, logs, source files, or shell history.
Review generated commands. Assistant-generated output is a development starting point. Inspect commands before running them, especially commands that install packages, start containers, write files, or change system configuration.
Provide full error output when debugging. Container logs, health check failures, gRPC errors, and nvidia-smi output help the assistant choose the right troubleshooting path.
Sample Prompts#
The following prompts are ordered from basic planning tasks to advanced deployment tasks.
Prompt |
Expected Skill Path |
|---|---|
“Which Riva model should I use for real-time call-center transcription with low latency and punctuation?” |
Model selection, then ASR reference. |
“Help me set up a fresh Ubuntu machine for Speech NIMs, including Docker, NVIDIA Container Toolkit, NGC login, and the Riva Python client.” |
Setup reference. |
“Use build.nvidia.com Riva ASR from Python to transcribe a WAV file with Parakeet.” |
Model selection, then ASR cloud inference. |
“Deploy a self-hosted Parakeet ASR NIM and show me how to run a WAV through it with gRPC.” |
ASR self-hosted deployment and inference. |
“List available Magpie TTS voices, then synthesize this text to a WAV file.” |
TTS deployment or inference. |
“Translate English to German with Riva NMT and keep the product name NVIDIA untranslated.” |
NMT inference with DNT guidance. |
“I fine-tuned an ASR model in NeMo and have a |
Custom ASR deployment. |
“Tune an ASR pipeline with Silero VAD, Sortformer diarization, a KenLM language model, and smaller chunk size.” |
ASR pipeline configuration. |
Expected Output#
When the skill activates, the assistant should:
Identify the specific Speech NIM workflow.
Load
SKILL.mdand only the relevant reference file for that workflow.Verify current release-specific facts with NVIDIA documentation or
build.nvidia.comwhen needed.Produce step-by-step commands, configuration, or code for the selected path.
Use placeholders for secret values and avoid printing real credentials.
Next Steps#
Host the Nemotron Speech MCP Server: Give agents a live control plane for customer-hosted Speech NIM deployments.
Prerequisites: Prepare hardware, drivers, and container runtime requirements.
NGC Access Setup: Generate registry credentials for self-hosted Speech NIM containers.
Install the NVIDIA Riva Python Client: Install the client package used in generated gRPC examples.
Support Matrix: Find currently supported models, profiles, languages, and hardware requirements.
Deploy and Run ASR NIM: Deploy ASR models with Docker.
Deploy and Run TTS NIM: Deploy and run TTS models.
Deploy and Run NMT NIM: Deploy and run NMT models.