For AI agents: a documentation index is available at the root level at /llms.txt and /llms-full.txt. Append /llms.txt to any URL for a page-level index, or .md for the markdown version of any page.
Digest
  • Getting Started
    • Quickstart
    • Introduction
    • Local Installation
    • Building from Source
    • Kubernetes Deployment
    • Contribution Guide
  • Resources
    • Support Matrix
    • Feature Matrix
    • Release Artifacts
    • Examples
    • Glossary
  • Digest
    • Dynamo Day 0 support for TokenSpeed
    • Multi-Turn Agentic Harnesses
    • Full-Stack Optimizations for Agentic Inference
    • Flash Indexer: Inter-Galactic KV Routing
  • Kubernetes Deployment
  • User Guides
    • Disaggregated Serving
    • KV Cache Aware Routing
    • KV Cache Offloading
    • Tool Calling
    • Reasoning
    • Agents
      • Agent Tracing
      • Agent Hints
      • SGLang for Agentic Workloads
    • Multimodal
    • Diffusion
    • LoRA Adapters
    • Observability (Local)
    • Fault Tolerance
    • Benchmarking
    • Writing Python Workers
  • Backends
    • SGLang
    • TensorRT-LLM
    • vLLM
  • Components
    • Frontend
    • Router
    • Planner
    • Profiler
    • KVBM
  • Integrations
    • LMCache
    • SGLang HiCache
    • FlexKV
    • KV Events for Custom Engines
  • Design Docs
    • Overall Architecture
    • Architecture Flow
    • Disaggregated Serving
    • Distributed Runtime
  • Documentation
    • Dynamo Docs Guide
NVIDIANVIDIA
Developer-friendly docs for your API
Privacy Policy | Your Privacy Choices | Terms of Service | Accessibility | Corporate Policies | Product Security | Contact

Copyright © 2026, NVIDIA Corporation.

LogoLogoDocumentation
Digest
On this page
  • Core Concepts
  • Backend-Specific Guides
  • Request Surface
User Guides

Agents

Agent-aware serving features in Dynamo

||View as Markdown|
Previous

Reasoning Parsing (Engine Fallback)

Next

Agent Tracing

Dynamo provides a small set of request extensions and trace utilities for serving agentic workloads. The harness remains responsible for the semantic agent trajectory. Dynamo receives lightweight metadata and uses it for serving telemetry, routing hints, and backend-specific cache behavior.

Core Concepts

ConceptPurpose
Agent TracingPassive session_id/trajectory_id metadata plus Dynamo-owned request timing, token, cache, worker-placement, and harness tool-event traces.
Agent HintsOptional per-request hints such as priority, expected output length, and speculative prefill.
Tool CallingSupported tool-call parsers and parser names.
ReasoningSupported reasoning parsers for chain-of-thought models.
Chat ProcessorsDynamo, vLLM, and SGLang preprocessing options.

Backend-Specific Guides

Agent features are exposed through common request metadata, but backend support varies by runtime.

Backend GuideContents
SGLang for Agentic WorkloadsPriority scheduling, priority-based radix eviction, speculative prefill, and streaming session control for subagent KV isolation.

Request Surface

Agent-facing request metadata lives under nvext on OpenAI-compatible request bodies:

1{
2 "nvext": {
3 "agent_context": {
4 "session_type_id": "deep_research",
5 "session_id": "research-run-42",
6 "trajectory_id": "research-run-42:researcher"
7 },
8 "agent_hints": {
9 "priority": 5,
10 "osl": 1024
11 }
12 }
13}

Use agent_context when you want traceability across LLM calls, tool calls, and external trajectory files. Use agent_hints only when the harness has serving-relevant intent that Dynamo can act on.