Configuration Reference#

The AI-Q blueprint is configured through a single YAML file that defines LLMs, tools, agents, and the workflow. The NeMo Agent Toolkit reads this file at startup and wires everything together.

Config File Structure#

Every config file has four top-level sections:

general:     # Telemetry, logging, front-end settings
llms:        # LLM definitions (model, endpoint, parameters)
functions:   # Tools and agents (search tools, classifiers, research agents)
workflow:    # Top-level orchestrator configuration

Environment Variable Substitution#

You can reference environment variables anywhere in the YAML using shell-style syntax:

# Required variable (fails if not set)
api_key: ${NVIDIA_API_KEY}

# Variable with a default value
checkpoint_db: ${AIQ_CHECKPOINT_DB:-./checkpoints.db}

# Nested in a URL
collection_name: ${COLLECTION_NAME:-test_collection}

The syntax ${VAR_NAME} substitutes the value of the environment variable. The syntax ${VAR_NAME:-default} provides a fallback value if the variable is not set. Environment variables are typically defined in deploy/.env or .env at the project root.


general Section#

Controls telemetry, logging, and the application front-end.

general:
  use_uvloop: true          # Use uvloop for better async performance (web mode)
  telemetry:
    logging:
      console:
        _type: console
        level: INFO          # DEBUG, INFO, WARNING, ERROR
    tracing:
      phoenix:               # Optional: Phoenix observability
        _type: phoenix
        endpoint: http://localhost:6006/v1/traces
        project: dev
  front_end:                 # Only for web/API mode
    _type: aiq_api
    runner_class: aiq_api.plugin.AIQAPIWorker
    db_url: ${NAT_JOB_STORE_DB_URL:-sqlite+aiosqlite:///./jobs.db}
    expiry_seconds: 86400
    cors:
      allow_origin_regex: 'http://localhost(:\d+)?|http://127.0.0.1(:\d+)?'
      allow_methods: [GET, POST, DELETE, OPTIONS]
      allow_headers: ["*"]
      allow_credentials: true
      expose_headers: ["*"]

Parameter

Type

Default

Description

use_uvloop

bool

false

Enable uvloop for improved async I/O performance. Recommended for web mode.

telemetry.logging.console._type

str

console

Logging backend type.

telemetry.logging.console.level

str

INFO

Log level: DEBUG, INFO, WARNING, ERROR.

telemetry.tracing

object

Optional tracing configuration (Phoenix, OpenTelemetry).

front_end._type

str

Front-end type. Use aiq_api for the web API server. Omit for CLI mode.

front_end.db_url

str

sqlite+aiosqlite:///./jobs.db

Database URL for async job persistence.

front_end.expiry_seconds

int

86400

How long completed jobs remain in the database (seconds).

front_end.cors

object

CORS settings for the API server.


llms Section#

Defines named LLM instances. Each entry gets a user-chosen key (for example, nemotron_llm) that agents reference.

llms:
  nemotron_llm:
    _type: nim
    model_name: nvidia/nemotron-3-nano-30b-a3b
    base_url: "https://integrate.api.nvidia.com/v1"
    temperature: 0.1
    top_p: 0.3
    max_tokens: 16384
    num_retries: 5
    chat_template_kwargs:
      enable_thinking: true

Parameter

Type

Default

Description

_type

str

required

LLM provider type. Use nim for NVIDIA NIM endpoints, openai for OpenAI-compatible endpoints.

model_name

str

required

Model identifier (for example, nvidia/nemotron-3-nano-30b-a3b, azure/openai/gpt-4.1-mini).

base_url

str

None

API endpoint URL. Should always be set explicitly for NVIDIA NIM endpoints.

api_key

str

API key. If omitted, uses NVIDIA_API_KEY from the environment.

temperature

float

None

Sampling temperature. Lower values produce more deterministic output. When None, the API uses its server-side default.

top_p

float

None

Nucleus sampling threshold. When None, the API uses its server-side default.

max_tokens

int

300

Maximum tokens in the response. Set higher values (for example, 16384 or 128000) for research agents.

num_retries

int

5

Number of retry attempts on API failure.

chat_template_kwargs

object

Extra arguments passed to the chat template. Use enable_thinking: true to activate the model’s chain-of-thought reasoning.

Common LLM Configurations#

Different agents benefit from different parameter profiles:

Role

Temperature

Top-p

Max Tokens

Notes

Intent classifier

0.5

0.9

4096

Moderate creativity for classification

Shallow researcher

0.1

0.3

16384

Low temperature for factual accuracy

Deep research orchestrator

1.0

1.0

128000

High temperature with thinking enabled for deep reasoning

Summary LLM

0.3

100

Conservative, short output for document summaries


functions Section#

Defines tools and agents. Each entry has a _type field that maps to a registered NeMo Agent Toolkit plugin. The key you assign (for example, web_search_tool) becomes the name used in tools lists.

knowledge_retrieval#

Semantic search over ingested documents. Supports two backends: LlamaIndex (local ChromaDB) and Foundational RAG (hosted NVIDIA RAG Blueprint).

functions:
  # LlamaIndex backend
  knowledge_search:
    _type: knowledge_retrieval
    backend: llamaindex
    collection_name: ${COLLECTION_NAME:-test_collection}
    top_k: 5
    chroma_dir: ${AIQ_CHROMA_DIR:-/tmp/chroma_data}
    generate_summary: true
    summary_model: summary_llm
    summary_db: ${AIQ_SUMMARY_DB:-sqlite+aiosqlite:///./summaries.db}
functions:
  # Foundational RAG backend
  knowledge_search:
    _type: knowledge_retrieval
    backend: foundational_rag
    collection_name: ${COLLECTION_NAME:-test_collection}
    top_k: 5
    rag_url: ${RAG_SERVER_URL:-http://localhost:8081/v1}
    ingest_url: ${RAG_INGEST_URL:-http://localhost:8082/v1}
    timeout: 300
    # verify_ssl: false            # Only set to false for self-signed certs

Parameter

Type

Default

Description

backend

str

llamaindex

Backend type: llamaindex or foundational_rag.

collection_name

str

default

Name of the document collection/index.

top_k

int

5

Number of results to return per query.

generate_summary

bool

false

Generate one-sentence summaries for ingested documents.

summary_model

str

None

LLM reference from llms section. Required when generate_summary: true.

summary_db

str

sqlite+aiosqlite:///./summaries.db

Database URL for document summaries (SQLite or PostgreSQL).

chroma_dir

str

/tmp/chroma_data

ChromaDB persistence directory. LlamaIndex backend only.

rag_url

str

http://localhost:8081/v1

RAG query server URL. Foundational RAG backend only.

ingest_url

str

http://localhost:8082/v1

RAG ingestion server URL. Foundational RAG backend only.

timeout

int

120

Request timeout in seconds. Foundational RAG backend only.

verify_ssl

bool

true

Verify SSL certificates. Set false for self-signed certs. Foundational RAG backend only.

intent_classifier#

Classifies user queries as meta (conversational) or research, and determines research depth (shallow vs. deep).

functions:
  intent_classifier:
    _type: intent_classifier
    llm: nemotron_llm_intent
    tools:
      - web_search_tool
      - paper_search_tool
    verbose: true
    llm_timeout: 90

Parameter

Type

Default

Description

llm

str

required

Reference to an LLM defined in llms section.

tools

list[str]

[]

Tool references passed to the intent prompt for tool-awareness.

verbose

bool

false

Enable verbose logging with trace callbacks.

llm_timeout

float

90

Timeout in seconds for the intent classification LLM call.

clarifier_agent#

Interactive clarification dialog for deep research queries. Asks follow-up questions to refine scope before research begins.

functions:
  clarifier_agent:
    _type: clarifier_agent
    llm: nemotron_llm
    planner_llm: nemotron_llm
    tools:
      - web_search_tool
    max_turns: 3
    enable_plan_approval: true
    max_plan_iterations: 10
    log_response_max_chars: 2000
    verbose: true

Parameter

Type

Default

Description

llm

str

required

LLM for generating clarification questions.

planner_llm

str

None

LLM for plan generation. Falls back to llm if not specified.

tools

list[str]

[]

Tools available for gathering context during clarification.

max_turns

int

3

Maximum number of clarification Q&A turns before auto-completing.

enable_plan_approval

bool

false

Show research plan to the user for approval after clarification.

max_plan_iterations

int

10

Maximum plan feedback iterations before auto-approving.

log_response_max_chars

int

2000

Maximum characters to log from LLM responses.

verbose

bool

false

Enable verbose logging.

shallow_research_agent#

Fast, single-pass research agent that produces citation-backed answers in one tool-calling loop.

functions:
  shallow_research_agent:
    _type: shallow_research_agent
    llm: nemotron_llm
    tools:
      - web_search_tool
      - knowledge_search
    max_llm_turns: 10
    max_tool_iterations: 5
    verbose: true

Parameter

Type

Default

Description

llm

str

required

LLM for research and synthesis.

tools

list[str]

[]

Search tools available to the agent.

max_llm_turns

int

10

Maximum number of LLM turns (includes both reasoning and tool-calling steps).

max_tool_iterations

int

5

Maximum tool-calling iterations before forcing synthesis.

verbose

bool

false

Enable verbose logging.

deep_research_agent#

Multi-phase research agent with separate orchestrator, planner, and researcher sub-agents that produces long-form reports.

functions:
  deep_research_agent:
    _type: deep_research_agent
    orchestrator_llm: nemotron_llm_deep
    researcher_llm: nemotron_llm_deep
    planner_llm: nemotron_llm_deep
    tools:
      - paper_search_tool
      - advanced_web_search_tool
      - knowledge_search
    max_loops: 2
    verbose: true

Parameter

Type

Default

Description

orchestrator_llm

str

required

LLM for the orchestrator that coordinates the research workflow.

researcher_llm

str

None

LLM for the researcher sub-agent. Falls back to orchestrator_llm if not specified.

planner_llm

str

None

LLM for the planner sub-agent. Falls back to orchestrator_llm if not specified.

tools

list[str]

[]

Search tools available to the researcher sub-agent.

max_loops

int

2

Maximum number of orchestrator planning/research loops.

verbose

bool

true

Enable verbose logging.


workflow Section#

Defines the top-level orchestrator that wires together all agents.

workflow:
  _type: chat_deepresearcher_agent
  enable_escalation: true
  enable_clarifier: true
  use_async_deep_research: true
  max_history: 20
  verbose: true
  checkpoint_db: ${AIQ_CHECKPOINT_DB:-./checkpoints.db}

Parameter

Type

Default

Description

_type

str

required

Workflow type. Use chat_deepresearcher_agent for the full pipeline.

enable_escalation

bool

true

Allow the intent classifier to route queries to deep research. When false, all research queries use shallow research only.

enable_clarifier

bool

true

Run the clarifier agent before deep research to gather user requirements.

use_async_deep_research

bool

false

Submit deep research as an async background job (requires Dask scheduler).

max_history

int

20

Maximum number of messages to keep in conversation history before trimming.

verbose

bool

false

Enable verbose logging.

checkpoint_db

str

./checkpoints.db

SQLite path or PostgreSQL DSN for persistent conversation checkpoints.

Note: interactive_auth is a YAML-level field consumed by the CLI entry point (start_cli.sh / aiq-research), not a Pydantic field on ChatDeepResearcherConfig. It can be set in YAML config files but is not part of the workflow config class.


Complete Annotated Example#

Below is a complete configuration for CLI mode with web search, paper search, and clarification enabled:

# General settings
general:
  telemetry:
    logging:
      console:
        _type: console
        level: INFO                    # Set to DEBUG for troubleshooting

# LLM definitions
llms:
  intent_llm:                          # Used by intent classifier
    _type: nim
    model_name: nvidia/nemotron-3-nano-30b-a3b
    base_url: "https://integrate.api.nvidia.com/v1"
    temperature: 0.5
    top_p: 0.9
    max_tokens: 4096
    num_retries: 5
    chat_template_kwargs:
      enable_thinking: true

  research_llm:                        # Used by shallow researcher + clarifier
    _type: nim
    model_name: nvidia/nemotron-3-nano-30b-a3b
    base_url: "https://integrate.api.nvidia.com/v1"
    temperature: 0.1
    top_p: 0.3
    max_tokens: 16384
    num_retries: 5
    chat_template_kwargs:
      enable_thinking: true

  deep_llm:                            # Used by deep research orchestrator
    _type: nim
    model_name: nvidia/nemotron-3-nano-30b-a3b
    base_url: "https://integrate.api.nvidia.com/v1"
    temperature: 1.0
    top_p: 1.0
    max_tokens: 128000
    num_retries: 5
    chat_template_kwargs:
      enable_thinking: true

# Tools and agents
functions:
  web_search_tool:                     # Standard web search
    _type: tavily_web_search
    max_results: 5
    max_content_length: 1000

  advanced_web_search_tool:            # Deep search (fewer results, more depth)
    _type: tavily_web_search
    max_results: 2
    advanced_search: true

  paper_search_tool:                   # Academic paper search
    _type: paper_search
    max_results: 5
    serper_api_key: ${SERPER_API_KEY}

  intent_classifier:                   # Classifies queries, routes depth
    _type: intent_classifier
    llm: intent_llm
    tools:
      - web_search_tool
      - paper_search_tool

  clarifier_agent:                     # Asks clarifying questions for deep research
    _type: clarifier_agent
    llm: research_llm
    planner_llm: research_llm
    tools:
      - web_search_tool
    max_turns: 3
    enable_plan_approval: true
    verbose: true

  shallow_research_agent:              # Fast single-pass research
    _type: shallow_research_agent
    llm: research_llm
    tools:
      - web_search_tool
    max_llm_turns: 10
    max_tool_iterations: 5

  deep_research_agent:                 # Multi-phase deep research
    _type: deep_research_agent
    orchestrator_llm: deep_llm
    tools:
      - paper_search_tool
      - advanced_web_search_tool
    max_loops: 2

# Top-level orchestrator
workflow:
  _type: chat_deepresearcher_agent
  enable_escalation: true              # Allow deep research routing
  enable_clarifier: true               # Ask clarifying questions first
  checkpoint_db: ${AIQ_CHECKPOINT_DB:-./checkpoints.db}

Provided Config Files#

The repository includes several pre-built configurations:

File

Mode

Features

configs/config_cli_default.yml

CLI

Web search, paper search, clarifier with plan approval

configs/config_web_default_llamaindex.yml

Web API

LlamaIndex knowledge retrieval, web search, paper search

configs/config_web_frag.yml

Web API

Foundational RAG knowledge retrieval, web search, paper search