Dynamo Request Planes User Guide#

Overview#

Dynamo supports multiple transport mechanisms for its request plane (the communication layer between services). You can choose from three different request plane modes based on your deployment requirements:

  • TCP (default): Direct TCP connection for optimal performance

  • NATS: Message broker-based request plane

  • HTTP: HTTP/2-based request plane

This guide explains how to configure and use request plane in your Dynamo deployment.

What is a Request Plane?#

The request plane is the transport layer that handles communication between Dynamo services (e.g., frontend to backend, worker to worker). Different request planes offer different trade-offs:

Request Plane

Suitable For

Characteristics

NATS

Production deployments with KV routing

Requires NATS infrastructure, provides pub/sub patterns, highest flexibility

TCP

Low-latency direct communication

Direct connections, minimal overhead

HTTP

Standard deployments, debugging

HTTP/2 protocol, easier observability with standard tools, widely compatible

Request Plane vs KV Event Plane#

Dynamo has two independent communication planes:

  • Request plane (DYN_REQUEST_PLANE): how RPC requests flow between components (frontend → router → worker), via tcp, http, or nats.

  • KV event plane (currently only NATS is supported): how KV cache events (and optional router replica sync) are distributed/persisted for KV-aware routing.

Note: If you are using tcp or http request plane with KV events enabled (default), NATS is automatically initialized. You can optionally configure NATS_SERVER environment variable (e.g., NATS_SERVER=nats://nats-hostname:port) to specify a custom NATS server; otherwise, it defaults to localhost:4222. To completely disable NATS, use --no-kv-events on the frontend.

Because they are independent, you can mix them.

For example, a deployment with TCP request plane can use different KV event planes:

  • JetStream KV events: requests use TCP, KV routing still uses NATS JetStream + object store for persistence.

  • NATS Core KV events (local indexer): requests use TCP, KV events use NATS Core pub/sub and persistence lives on workers.

  • no KV events: requests use TCP and KV routing runs without events (no NATS required, but no event-backed persistence).

Configuration#

Environment Variable#

Set the request plane mode using the DYN_REQUEST_PLANE environment variable:

export DYN_REQUEST_PLANE=<mode>

Where <mode> is one of:

  • tcp (default)

  • nats

  • http

The value is case-insensitive.

Default Behavior#

If DYN_REQUEST_PLANE is not set or contains an invalid value, Dynamo defaults to tcp.

Usage Examples#

Using TCP (Default)#

TCP is the default request plane and provides direct, low-latency communication between services.

Configuration:

# TCP is the default, so no need to set DYN_REQUEST_PLANE explicitly
# But you can explicitly set it if desired:
export DYN_REQUEST_PLANE=tcp

# Optional: Configure TCP server host and port
export DYN_TCP_RPC_HOST=0.0.0.0  # Default host
# export DYN_TCP_RPC_PORT=9999   # Optional: specify a fixed port

# Run your Dynamo service
DYN_REQUEST_PLANE=tcp python -m dynamo.frontend --http-port=8000 &
DYN_REQUEST_PLANE=tcp python -m dynamo.vllm --model Qwen/Qwen3-0.6B

Note: By default, TCP uses an OS-assigned free port (port 0). This is ideal for environments where multiple services may run on the same machine or when you want to avoid port conflicts. If you need a specific port (e.g., for firewall rules), set DYN_TCP_RPC_PORT explicitly.

When to use TCP:

  • Simple deployments with direct service-to-service communication (e.g. frontend to backend)

  • Minimal infrastructure requirements (NATS is initialized by default for KV events but can be disabled with --no-kv-events)

  • Low-latency requirements

TCP Configuration Options:

Additional TCP-specific environment variables:

  • DYN_TCP_RPC_HOST: Server host address (default: auto-detected)

  • DYN_TCP_RPC_PORT: Server port. If not set, the OS assigns a free port automatically (recommended for most deployments). Set explicitly only if you need a specific port for firewall rules.

  • DYN_TCP_MAX_MESSAGE_SIZE: Maximum message size for TCP client (default: 32MB)

  • DYN_TCP_REQUEST_TIMEOUT: Request timeout for TCP client (default: 10 seconds)

  • DYN_TCP_POOL_SIZE: Connection pool size for TCP client (default: 50)

  • DYN_TCP_CONNECT_TIMEOUT: Connect timeout for TCP client (default: 3 seconds)

  • DYN_TCP_CHANNEL_BUFFER: Request channel buffer size for TCP client (default: 100)

Using HTTP#

HTTP/2 provides a standards-based request plane that’s easy to debug and widely compatible.

Configuration:

# Optional: Configure HTTP server host and port
export DYN_HTTP_RPC_HOST=0.0.0.0      # Default host
export DYN_HTTP_RPC_PORT=8888         # Default port
export DYN_HTTP_RPC_ROOT_PATH=/v1/rpc # Default path

# Run your Dynamo service
DYN_REQUEST_PLANE=http python -m dynamo.frontend --http-port=8000 &
DYN_REQUEST_PLANE=http python -m dynamo.vllm --model Qwen/Qwen3-0.6B

When to use HTTP:

  • Standard deployments requiring HTTP compatibility

  • Debugging scenarios (use curl, browser tools, etc.)

  • Integration with HTTP-based infrastructure

  • Load balancers and proxies that work with HTTP

HTTP Configuration Options:

Additional HTTP-specific environment variables:

  • DYN_HTTP_RPC_HOST: Server host address (default: auto-detected)

  • DYN_HTTP_RPC_PORT: Server port (default: 8888)

  • DYN_HTTP_RPC_ROOT_PATH: Root path for RPC endpoints (default: /v1/rpc)

DYN_HTTP2_*: Various HTTP/2 client configuration options

  • DYN_HTTP2_MAX_FRAME_SIZE: Maximum frame size for HTTP client (default: 1MB)

  • DYN_HTTP2_MAX_CONCURRENT_STREAMS: Maximum concurrent streams for HTTP client (default: 1000)

  • DYN_HTTP2_POOL_MAX_IDLE_PER_HOST: Maximum idle connections per host for HTTP client (default: 100)

  • DYN_HTTP2_POOL_IDLE_TIMEOUT_SECS: Idle timeout for HTTP client (default: 90 seconds)

  • DYN_HTTP2_KEEP_ALIVE_INTERVAL_SECS: Keep-alive interval for HTTP client (default: 30 seconds)

  • DYN_HTTP2_KEEP_ALIVE_TIMEOUT_SECS: Keep-alive timeout for HTTP client (default: 10 seconds)

  • DYN_HTTP2_ADAPTIVE_WINDOW: Enable adaptive flow control (default: true)

Using NATS#

NATS provides durable jetstream messaging for request plane and can be used for KV events (and router replica sync).

Prerequisites:

  • NATS server must be running and accessible

  • Configure NATS connection via standard Dynamo NATS environment variables

# Explicitly set to NATS
export DYN_REQUEST_PLANE=nats

# Run your Dynamo service
DYN_REQUEST_PLANE=nats python -m dynamo.frontend --http-port=8000 &
DYN_REQUEST_PLANE=nats python -m dynamo.vllm --model Qwen/Qwen3-0.6B

When to use NATS:

  • Production deployments with service discovery

  • KV-aware routing with accurate cache state tracking (requires NATS for event transport). Note: approximate mode (--no-kv-events) provides KV routing without NATS but with reduced accuracy.

  • Need for message replay and persistence features

Limitations:

  • NATS does not support payloads beyond 16MB (use TCP for larger payloads)

Complete Example#

Here’s a complete example showing how to launch a Dynamo deployment with different request planes:

See examples/backends/vllm/launch/agg_request_planes.sh for a complete working example that demonstrates launching Dynamo with TCP, HTTP, or NATS request planes.

Real-World Example#

The Dynamo repository includes a complete example demonstrating all three request planes:

Location: examples/backends/vllm/launch/agg_request_planes.sh

cd examples/backends/vllm/launch

# Run with TCP
./agg_request_planes.sh --tcp

# Run with HTTP
./agg_request_planes.sh --http

# Run with NATS
./agg_request_planes.sh --nats

Architecture Details#

Network Manager#

The request plane implementation is centralized in the Network Manager (lib/runtime/src/pipeline/network/manager.rs), which:

  1. Reads the DYN_REQUEST_PLANE environment variable at startup

  2. Creates the appropriate server and client implementations

  3. Provides a transport-agnostic interface to the rest of the codebase

  4. Manages all network configuration and lifecycle

Transport Abstraction#

All request plane implementations conform to common trait interfaces:

  • RequestPlaneServer: Server-side interface for receiving requests

  • RequestPlaneClient: Client-side interface for sending requests

This abstraction means your application code doesn’t need to change when switching request planes.

Configuration Loading#

Request plane configuration is loaded from environment variables at startup and cached globally. The configuration hierarchy is:

  1. Mode Selection: DYN_REQUEST_PLANE (defaults to tcp)

  2. Transport-Specific Config: Mode-specific environment variables (e.g., DYN_TCP_*, DYN_HTTP2_*)

Migration Guide#

From NATS to TCP#

  1. Stop your Dynamo services

  2. Set environment variable DYN_REQUEST_PLANE=tcp

  3. Optionally configure TCP-specific settings (e.g., DYN_TCP_RPC_HOST). Note: DYN_TCP_RPC_PORT is optional; if not set, an OS-assigned free port is used automatically.

  4. Restart your services

From NATS to HTTP#

  1. Stop your Dynamo services

  2. Set environment variable DYN_REQUEST_PLANE=http

  3. Optionally configure HTTP-specific settings (DYN_HTTP_RPC_PORT, etc.)

  4. Restart your services

Testing the Migration#

After switching request planes, verify your deployment:

# Test with a simple request
curl http://localhost:8000/v1/chat/completions \
  -H "Content-Type: application/json" \
  -d '{
    "model": "Qwen/Qwen3-0.6B",
    "messages": [{"role": "user", "content": "Hello!"}]
  }'

Troubleshooting#

Issue: Services Can’t Communicate#

Symptoms: Requests timeout or fail to reach the backend

Solutions:

  • Verify all services use the same DYN_REQUEST_PLANE setting

  • Check that server ports are not blocked by k8s network policies or firewalls

  • For TCP/HTTP: Ensure host/port configurations are correct and accessible

  • For NATS: Verify NATS server is running and accessible

Issue: “Invalid request plane mode” Error#

Symptoms: Service fails to start with configuration error

Solutions:

  • Check DYN_REQUEST_PLANE spelling (valid values: nats, tcp, http)

  • Value is case-insensitive but must be one of the three options

  • If not set, defaults to tcp

Issue: Port Conflicts#

Symptoms: Server fails to start due to “address already in use”

Solutions:

  • TCP: By default, TCP uses an OS-assigned free port, so port conflicts should be rare. If you explicitly set DYN_TCP_RPC_PORT to a specific port and get conflicts, either change the port or remove the setting to use automatic port assignment.

  • HTTP default port: 8888 (adjust environment variable DYN_HTTP_RPC_PORT)

Performance Considerations#

Latency#

  • TCP: Lowest latency due to direct connections and binary serialization

  • HTTP: Moderate latency with HTTP/2 overhead

  • NATS: Moderate latency due to nats jet stream persistence

Resource Usage#

  • TCP: Minimal infrastructure (NATS required only if using KV events, can disable with --no-kv-events)

  • HTTP: Minimal infrastructure (NATS required only if using KV events, can disable with --no-kv-events)

  • NATS: Requires running NATS server (additional memory/CPU)