ποΈ Voice-Powered RAG Agent with NVIDIA Nemotron Models#
Build a complete end-to-end AI agent that accepts voice input, retrieves multimodal context, reasons with long-context models, and enforces safety guardrailsβall using the latest NVIDIA Nemotron open models.
π Features#
Voice Input: Nemotron Speech ASR for real-time speech-to-text
LangChain 1.0 Agent: Uses
langgraph.prebuilt.create_react_agentwith automatic loopingRAG as a Tool: On-demand retrieval - agent decides when to search knowledge base
Automatic Agent Loop: Can call tools multiple times until it has enough information
Multimodal RAG: Embed and retrieve both text and document images
Smart Reranking: Improve retrieval accuracy by 6-7% with cross-encoder reranking
Image Understanding: Describe visual content in context using vision-language models
Long-Context Reasoning: Generate responses with 1M token context window
Safety Guardrails (Always On): PII detection and content moderation enforced on all inputs/outputs
π¦ Models Used#
Component |
Model |
Parameters |
Deployment |
|---|---|---|---|
Speech-to-Text |
|
600M |
Self-hosted (NeMo) |
Embeddings |
|
1.7B |
Self-hosted (Transformers) |
Reranking |
|
1.7B |
Self-hosted (Transformers) |
Vision-Language |
|
12B |
NVIDIA API |
Reasoning |
|
30B |
NVIDIA API |
Safety |
|
8B |
Self-hosted (Transformers) |
π§ Requirements#
Hardware#
GPU: NVIDIA GPU with at least 24GB VRAM recommended (for self-hosted models)
CUDA: 11.8 or later
Software#
Python 3.10+
PyTorch 2.0+
NVIDIA API Key (for cloud-hosted models)
π Quick Start#
1. Clone the Repository#
git clone https://github.com/NVIDIA-NeMo/Nemotron.git
cd Nemotron/use-case-examples/nemotron-voice-rag-agent-example
2. Set Up Environment#
Option A: Standard CUDA (RTX, A100, etc.):
uv sync --extra cuda --index-url https://download.pytorch.org/whl/cu124
Option B: DGX Spark (GB10):
uv sync --extra cuda --index-url https://download.pytorch.org/whl/cu130
Note: Since nemo_toolkit[asr] may have specific PyTorch requirements, if you encounter dependency conflicts, install PyTorch first:
# For Spark/GB10 systems
uv pip install torch torchvision --index-url https://download.pytorch.org/whl/cu130
uv sync
# For standard CUDA systems
uv pip install torch torchvision --index-url https://download.pytorch.org/whl/cu124
uv sync
3. Configure API Key#
export NVIDIA_API_KEY="your-nvidia-api-key"
Get your API key from NVIDIA NGC.
4. Run the Tutorial#
jupyter notebook voice_rag_agent_tutorial.ipynb
π Project Structure#
nemotron-voice-rag-agent-example/
βββ voice_rag_agent_tutorial.ipynb # Main tutorial notebook
βββ README.md # This file
βββ requirements.txt # Python dependencies
βββ BlogSkeleton/ # Blog content and model docs
βββ BLOG.md
βββ BLOG_UPDATED.md
βββ Code Snippets/
βββ Model Information/
ποΈ Architecture#
βββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββ
β Voice-Powered LangChain 1.0 Agent with RAG Tool β
βββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββ€
β β
β π€ Voice Input β Nemotron Speech ASR β Text Query β
β β β
β π‘οΈ Input Safety Check (ALWAYS ENFORCED) β
β β β
β βββββββββββββββββββββββββββββββββββββββββββββββββββββββ β
β β LangGraph ReAct Agent Loop β β
β β (langgraph.prebuilt.create_react_agent) β β
β β β β
β β Agent (nemotron-3-nano-30b-a3b) β β
β β β β β
β β ββ> Decide: Need more info? β β
β β β β β
β β ββ> YES: Call RAG Tool βββ β β
β β β βββ Embed β β β
β β β βββ Vector Search β β β
β β β βββ Rerank β LOOP β β
β β β βββ Describe Images β UNTIL β β
β β β β SATISFIED β β
β β ββ< Tool Result ββββββββββ β β
β β β β β
β β ββ> NO: Generate final answer β β
β β β β
β βββββββββββββββββββββββββββββββββββββββββββββββββββββββ β
β β β
β π‘οΈ Output Safety Check (ALWAYS ENFORCED) β
β β β
β π Safe Text Output β
β β
βββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββ
π Tutorial Steps#
Environment Setup: Install dependencies and configure API keys
Multimodal RAG: Build embeddings and vector store for text + images
Speech Input: Add real-time speech transcription with Nemotron ASR
Safety Guardrails: Implement PII detection and content moderation
Reasoning LLM: Configure Nemotron for agent decision-making
LangChain 1.0 Agent: Create ReAct agent with automatic looping
Define RAG as a tool (not a fixed workflow step)
Use
langgraph.prebuilt.create_react_agentAgent automatically loops until it can answer
Safety enforced on all inputs and outputs
π― Use Cases#
Enterprise Q&A: Answer questions over documents with charts, tables, and images
Voice Assistants: Build conversational AI with voice input
Compliance: Detect PII and enforce content policies
Research: Query scientific papers with visual content
π License#
This project uses NVIDIA open models. Each model is governed by its respective license:
π€ Contributing#
Contributions are welcome! Please read our contributing guidelines before submitting PRs.