Agents | NVIDIA Dynamo Documentation

Dynamo provides a small set of request extensions and trace utilities for serving agentic workloads. The harness remains responsible for the semantic agent trajectory. Dynamo receives lightweight metadata and uses it for serving telemetry, routing hints, and backend-specific cache behavior.

Core Concepts

Concept	Purpose
Agent Tracing	Passive `session_id`/`trajectory_id` metadata plus Dynamo-owned request timing, token, cache, worker-placement, and harness tool-event traces.
Agent Hints	Optional per-request hints such as priority, expected output length, and speculative prefill.
Tool Calling	Supported tool-call parsers and parser names.
Reasoning	Supported reasoning parsers for chain-of-thought models.
Chat Processors	Dynamo, vLLM, and SGLang preprocessing options.

Backend-Specific Guides

Agent features are exposed through common request metadata, but backend support varies by runtime.

Backend Guide	Contents
SGLang for Agentic Workloads	Priority scheduling, priority-based radix eviction, speculative prefill, and streaming session control for subagent KV isolation.

Request Surface

Agent-facing request metadata lives under nvext on OpenAI-compatible request bodies:

1 {
2     "nvext": {
3         "agent_context": {
4             "session_type_id": "deep_research",
5             "session_id": "research-run-42",
6             "trajectory_id": "research-run-42:researcher"
7         },
8         "agent_hints": {
9             "priority": 5,
10             "osl": 1024
11         }
12     }
13 }

Use agent_context when you want traceability across LLM calls, tool calls, and external trajectory files. Use agent_hints only when the harness has serving-relevant intent that Dynamo can act on.