Example: Full Pipeline (LlamaIndex)#
The complete AI-Q blueprint configuration using LlamaIndex + ChromaDB for knowledge retrieval. This is the recommended setup for local development – zero external RAG infrastructure required.
This is based on configs/config_web_default_llamaindex.yml.
Configuration#
# config_web_default_llamaindex.yml (annotated)
# Full pipeline: Web mode with LlamaIndex knowledge layer
# ===========================================================================
# General settings
# ===========================================================================
general:
use_uvloop: true
telemetry:
logging:
console:
_type: console
level: INFO
front_end:
_type: aiq_api
runner_class: aiq_api.plugin.AIQAPIWorker
db_url: ${NAT_JOB_STORE_DB_URL:-sqlite+aiosqlite:///./jobs.db}
expiry_seconds: 86400
cors:
allow_origin_regex: 'http://localhost(:\d+)?|http://127.0.0.1(:\d+)?'
allow_methods: [GET, POST, DELETE, OPTIONS]
allow_headers: ["*"]
allow_credentials: true
expose_headers: ["*"]
# ===========================================================================
# LLMs
# ===========================================================================
llms:
nemotron_llm_intent:
_type: nim
model_name: nvidia/nemotron-3-nano-30b-a3b
base_url: "https://integrate.api.nvidia.com/v1"
temperature: 0.5
top_p: 0.9
max_tokens: 4096
num_retries: 5
chat_template_kwargs:
enable_thinking: true
nemotron_llm:
_type: nim
model_name: nvidia/nemotron-3-nano-30b-a3b
base_url: "https://integrate.api.nvidia.com/v1"
temperature: 0.1
top_p: 0.3
max_tokens: 16384
num_retries: 5
chat_template_kwargs:
enable_thinking: true
nemotron_llm_deep:
_type: nim
model_name: nvidia/nemotron-3-nano-30b-a3b
base_url: "https://integrate.api.nvidia.com/v1"
temperature: 1.0
top_p: 1.0
max_tokens: 128000
num_retries: 5
chat_template_kwargs:
enable_thinking: true
# LLM for document summaries (shown in the UI after upload)
summary_llm:
_type: nim
model_name: nvidia/nemotron-mini-4b-instruct
base_url: "https://integrate.api.nvidia.com/v1"
api_key: ${NVIDIA_API_KEY}
temperature: 0.3
max_tokens: 100
# ===========================================================================
# Functions (tools and agents)
# ===========================================================================
functions:
web_search_tool:
_type: tavily_web_search
max_results: 5
max_content_length: 1000
advanced_web_search_tool:
_type: tavily_web_search
max_results: 2
advanced_search: true
# -------------------------------------------------------------------------
# Knowledge retrieval (LlamaIndex + ChromaDB)
# -------------------------------------------------------------------------
# Stores embeddings locally in ChromaDB. No external RAG server needed.
# Documents are uploaded through the Knowledge API (/v1/collections).
knowledge_search:
_type: knowledge_retrieval
backend: llamaindex
collection_name: ${COLLECTION_NAME:-test_collection}
generate_summary: true # Generate per-doc summaries
summary_model: summary_llm # LLM for summaries
summary_db: ${AIQ_SUMMARY_DB:-sqlite+aiosqlite:///./summaries.db}
top_k: 5
chroma_dir: ${AIQ_CHROMA_DIR:-/tmp/chroma_data} # Local vector store
# Paper Search (optional - requires SERPER_API_KEY)
# Uncomment the block below and set SERPER_API_KEY to enable.
# paper_search_tool:
# _type: paper_search
# max_results: 5
# serper_api_key: ${SERPER_API_KEY}
intent_classifier:
_type: intent_classifier
llm: nemotron_llm_intent
verbose: true
tools:
- web_search_tool
# - paper_search_tool # Uncomment if SERPER_API_KEY is set
- knowledge_search
clarifier_agent:
_type: clarifier_agent
llm: nemotron_llm
planner_llm: nemotron_llm
tools:
- web_search_tool
- knowledge_search
max_turns: 3
enable_plan_approval: true
log_response_max_chars: 2000
verbose: true
shallow_research_agent:
_type: shallow_research_agent
llm: nemotron_llm
verbose: true
tools:
- web_search_tool
- knowledge_search
max_llm_turns: 10
max_tool_iterations: 5
deep_research_agent:
_type: deep_research_agent
orchestrator_llm: nemotron_llm_deep
max_loops: 2
verbose: true
tools:
# - paper_search_tool # Uncomment if SERPER_API_KEY is set
- advanced_web_search_tool
- knowledge_search
workflow:
_type: chat_deepresearcher_agent
verbose: true
enable_escalation: true
enable_clarifier: true
use_async_deep_research: true
checkpoint_db: ${AIQ_CHECKPOINT_DB:-./checkpoints.db}
Required Environment Variables#
# Core (required)
export NVIDIA_API_KEY="nvapi-..." # pragma: allowlist secret
export TAVILY_API_KEY="tvly-..." # pragma: allowlist secret
No RAG server URLs are needed – LlamaIndex uses local ChromaDB storage.
How to Run#
Backend#
source .venv/bin/activate
dotenv -f deploy/.env run .venv/bin/nat serve \
--config_file configs/config_web_default_llamaindex.yml
The server starts at http://localhost:8000.
Frontend (optional)#
cd frontends/ui && npm run dev
Open http://localhost:3000 in your browser.
Upload Documents#
# Create a collection
curl -X POST http://localhost:8000/v1/collections \
-H "Content-Type: application/json" \
-d '{"name": "my-docs", "description": "My document collection"}'
# Upload files
curl -X POST http://localhost:8000/v1/collections/my-docs/documents \
-F "files=@report.pdf"
Ask Questions#
# Submit a query
curl -X POST http://localhost:8000/v1/jobs/async/submit \
-H "Content-Type: application/json" \
-d '{"agent_type": "shallow_researcher", "input": "What is CUDA?"}'
# Stream events
curl -N http://localhost:8000/v1/jobs/async/job/{job_id}/stream
Key Differences from Foundational RAG#
Aspect |
LlamaIndex (this config) |
Foundational RAG |
|---|---|---|
Vector store |
Local ChromaDB |
Hosted RAG server |
External infra |
None |
RAG + ingest servers |
Document summaries |
Yes ( |
No |
Best for |
Local development |
Production multi-user |
For production multi-user deployments, refer to Full Pipeline – Foundational RAG.