For AI agents: a documentation index is available at the root level at /llms.txt and /llms-full.txt. Append /llms.txt to any URL for a page-level index, or .md for the markdown version of any page.
Digest
  • Getting Started
    • Quickstart
    • Introduction
    • Local Installation
    • Building from Source
    • Kubernetes Deployment
    • Contribution Guide
  • Resources
    • Support Matrix
    • Feature Matrix
    • Release Artifacts
    • Examples
    • Glossary
  • Digest
    • Dynamo Day 0 support for TokenSpeed
    • Multi-Turn Agentic Harnesses
    • Full-Stack Optimizations for Agentic Inference
    • Flash Indexer: Inter-Galactic KV Routing
  • Kubernetes Deployment
  • User Guides
    • Disaggregated Serving
    • KV Cache Aware Routing
    • KV Cache Offloading
    • Tool Calling
    • Reasoning
    • Agents
    • Multimodal
    • Diffusion
    • LoRA Adapters
    • Fastokens Tokenizer
    • Observability (Local)
    • Fault Tolerance
    • Benchmarking
    • Writing Python Workers
    • Writing Python Unified Backends
    • Writing Rust Unified Backends
  • Backends
    • SGLang
    • TensorRT-LLM
    • vLLM
  • Components
    • Frontend
    • Router
    • Planner
    • Profiler
    • KVBM
  • Integrations
    • LMCache
    • FlexKV
    • KV Events for Custom Engines
  • Design Docs
    • Overall Architecture
    • Architecture Flow
    • Disaggregated Serving
    • Distributed Runtime
      • Discovery Plane
      • Request Plane
      • Event Plane
  • Documentation
    • Dynamo Docs Guide
NVIDIANVIDIA
Developer-friendly docs for your API
Privacy Policy | Your Privacy Choices | Terms of Service | Accessibility | Corporate Policies | Product Security | Contact

Copyright © 2026, NVIDIA Corporation.

LogoLogoDocumentation
Digest
On this page
  • Overview
  • What is a Request Plane?
  • Request Plane vs KV Event Plane
  • Configuration
  • Environment Variable
  • Default Behavior
  • Usage Examples
  • Using TCP (Default)
  • Using NATS
  • Complete Example
  • Real-World Example
  • Architecture Details
  • Network Manager
  • Transport Abstraction
  • Configuration Loading
  • Migration Guide
  • From NATS to TCP
  • Testing the Migration
  • Troubleshooting
  • Issue: Services Can’t Communicate
  • Issue: “Invalid request plane mode” Error
  • Issue: Port Conflicts
  • Performance Considerations
  • Latency
  • Resource Usage
Design DocsCommunication Planes

Request Plane

||View as Markdown|
Edit this page
Previous

Discovery Plane

Next

Event Plane

Overview

Dynamo supports two transport mechanisms for its request plane (the communication layer between services):

  • TCP (default): Direct TCP connection for optimal performance
  • NATS: Message broker-based request plane

This guide explains how to configure and use request plane in your Dynamo deployment.

What is a Request Plane?

The request plane is the transport layer that handles communication between Dynamo services (e.g., frontend to backend, worker to worker). Different request planes offer different trade-offs:

Request PlaneSuitable ForCharacteristics
NATSProduction deployments with KV routingRequires NATS infrastructure, provides pub/sub patterns, highest flexibility
TCPLow-latency direct communicationDirect connections, minimal overhead

Request Plane vs KV Event Plane

Dynamo has two independent communication planes:

  • Request plane (DYN_REQUEST_PLANE): how RPC requests flow between components (frontend → router → worker), via tcp, or nats.
  • KV event plane (currently only NATS is supported): how KV cache events (and optional router replica sync) are distributed/persisted for KV-aware routing.

Note: If you are using tcp request plane with KV events enabled on the router (the default router-side setting), NATS is automatically initialized. SGLang requires explicit --kv-events-config and TRT-LLM requires --publish-events-and-metrics to publish events. For vLLM, KV events are currently auto-configured when prefix caching is active (deprecated — use --kv-events-config explicitly to prepare for a future release where all backends will default to off). You can optionally configure NATS_SERVER environment variable (e.g., NATS_SERVER=nats://nats-hostname:port) to specify a custom NATS server; otherwise, it defaults to localhost:4222. To disable the router’s KV event listener, use --no-router-kv-events on the frontend.

Because they are independent, you can mix them.

For example, a deployment with TCP request plane can use different KV event planes:

  • JetStream KV events: requests use TCP, KV routing still uses NATS JetStream + object store for persistence.
  • NATS Core KV events (local indexer): requests use TCP, KV events use NATS Core pub/sub and persistence lives on workers.
  • no KV events: requests use TCP and KV routing runs without events (no NATS required, but no event-backed persistence).

Configuration

Environment Variable

Set the request plane mode using the DYN_REQUEST_PLANE environment variable:

$export DYN_REQUEST_PLANE=<mode>

Where <mode> is one of:

  • tcp (default)
  • nats

The value is case-insensitive.

Default Behavior

If DYN_REQUEST_PLANE is not set or contains an invalid value, Dynamo defaults to tcp.

Usage Examples

Using TCP (Default)

TCP is the default request plane and provides direct, low-latency communication between services.

Configuration:

$# TCP is the default, so no need to set DYN_REQUEST_PLANE explicitly
$# But you can explicitly set it if desired:
$export DYN_REQUEST_PLANE=tcp
$
$# Optional: Configure TCP server host and port
$export DYN_TCP_RPC_HOST=0.0.0.0 # Default host
$# export DYN_TCP_RPC_PORT=9999 # Optional: specify a fixed port
$
$# Run your Dynamo service
$DYN_REQUEST_PLANE=tcp python -m dynamo.frontend --http-port=8000 &
$DYN_REQUEST_PLANE=tcp python -m dynamo.vllm --model Qwen/Qwen3-0.6B

Note: By default, TCP uses an OS-assigned free port (port 0). This is ideal for environments where multiple services may run on the same machine or when you want to avoid port conflicts. If you need a specific port (e.g., for firewall rules), set DYN_TCP_RPC_PORT explicitly.

When to use TCP:

  • Simple deployments with direct service-to-service communication (e.g. frontend to backend)
  • Minimal infrastructure requirements (NATS is initialized when the router listens for KV events; disable with --no-router-kv-events)
  • Low-latency requirements

TCP Configuration Options:

Additional TCP-specific environment variables:

  • DYN_TCP_RPC_HOST: Server host address (default: auto-detected)
  • DYN_TCP_RPC_PORT: Server port. If not set, the OS assigns a free port automatically (recommended for most deployments). Set explicitly only if you need a specific port for firewall rules.
  • DYN_TCP_MAX_MESSAGE_SIZE: Maximum message size for TCP client (default: 32MB)
  • DYN_TCP_SHRINK_MESSAGE_SIZE: Threshold for shrinking the zero-copy decoder buffer back to initial size after processing large messages (default: 8MB, max: DYN_TCP_MAX_MESSAGE_SIZE)
  • DYN_TCP_REQUEST_TIMEOUT: Request timeout for TCP client (default: 10 seconds)
  • DYN_TCP_POOL_SIZE: Connection pool size for TCP client (default: 50)
  • DYN_TCP_CONNECT_TIMEOUT: Connect timeout for TCP client (default: 3 seconds)
  • DYN_TCP_CHANNEL_BUFFER: Request channel buffer size for TCP client (default: 100)

Using NATS

NATS provides durable jetstream messaging for request plane and can be used for KV events (and router replica sync).

Prerequisites:

  • NATS server must be running and accessible
  • Configure NATS connection via standard Dynamo NATS environment variables
$# Explicitly set to NATS
$export DYN_REQUEST_PLANE=nats
$
$# Run your Dynamo service
$DYN_REQUEST_PLANE=nats python -m dynamo.frontend --http-port=8000 &
$DYN_REQUEST_PLANE=nats python -m dynamo.vllm --model Qwen/Qwen3-0.6B

When to use NATS:

  • Production deployments with service discovery
  • KV-aware routing with accurate cache state tracking (requires NATS for event transport). Note: approximate mode (--no-router-kv-events) provides KV routing without NATS but with reduced accuracy.
  • Need for message replay and persistence features

Limitations:

  • NATS does not support payloads beyond 16MB (use TCP for larger payloads)

Complete Example

Here’s a complete example showing how to launch a Dynamo deployment with different request planes:

See examples/backends/vllm/launch/agg_request_planes.sh for a complete working example that demonstrates launching Dynamo with TCP or NATS request planes.

Real-World Example

The Dynamo repository includes a complete example demonstrating both request planes:

Location: examples/backends/vllm/launch/agg_request_planes.sh

$cd examples/backends/vllm/launch
$
$# Run with TCP
$./agg_request_planes.sh --tcp
$
$# Run with NATS
$./agg_request_planes.sh --nats

Architecture Details

Network Manager

The request plane implementation is centralized in the Network Manager (lib/runtime/src/pipeline/network/manager.rs), which:

  1. Reads the DYN_REQUEST_PLANE environment variable at startup
  2. Creates the appropriate server and client implementations
  3. Provides a transport-agnostic interface to the rest of the codebase
  4. Manages all network configuration and lifecycle

Transport Abstraction

All request plane implementations conform to common trait interfaces:

  • RequestPlaneServer: Server-side interface for receiving requests
  • RequestPlaneClient: Client-side interface for sending requests

This abstraction means your application code doesn’t need to change when switching request planes.

Configuration Loading

Request plane configuration is loaded from environment variables at startup and cached globally. The configuration hierarchy is:

  1. Mode Selection: DYN_REQUEST_PLANE (defaults to tcp)
  2. Transport-Specific Config: Mode-specific environment variables (e.g., DYN_TCP_*)

Migration Guide

From NATS to TCP

  1. Stop your Dynamo services
  2. Set environment variable DYN_REQUEST_PLANE=tcp
  3. Optionally configure TCP-specific settings (e.g., DYN_TCP_RPC_HOST). Note: DYN_TCP_RPC_PORT is optional; if not set, an OS-assigned free port is used automatically.
  4. Restart your services

Testing the Migration

After switching request planes, verify your deployment:

$# Test with a simple request
$curl http://localhost:8000/v1/chat/completions \
> -H "Content-Type: application/json" \
> -d '{
> "model": "Qwen/Qwen3-0.6B",
> "messages": [{"role": "user", "content": "Hello!"}]
> }'

Troubleshooting

Issue: Services Can’t Communicate

Symptoms: Requests timeout or fail to reach the backend

Solutions:

  • Verify all services use the same DYN_REQUEST_PLANE setting
  • Check that server ports are not blocked by k8s network policies or firewalls
  • For TCP: Ensure host/port configurations are correct and accessible
  • For NATS: Verify NATS server is running and accessible

Issue: “Invalid request plane mode” Error

Symptoms: Service fails to start with configuration error

Solutions:

  • Check DYN_REQUEST_PLANE spelling (valid values: nats, tcp)
  • Value is case-insensitive but must be one of the two options
  • If not set, defaults to tcp

Issue: Port Conflicts

Symptoms: Server fails to start due to “address already in use”

Solutions:

  • TCP: By default, TCP uses an OS-assigned free port, so port conflicts should be rare. If you explicitly set DYN_TCP_RPC_PORT to a specific port and get conflicts, either change the port or remove the setting to use automatic port assignment.

Performance Considerations

Latency

  • TCP: Lowest latency due to direct connections and binary serialization
  • NATS: Moderate latency due to nats jet stream persistence

Resource Usage

  • TCP: Minimal infrastructure (NATS required only if using KV events, disable router-side with --no-router-kv-events)
  • NATS: Requires running NATS server (additional memory/CPU)