For AI agents: a documentation index is available at the root level at /llms.txt and /llms-full.txt. Append /llms.txt to any URL for a page-level index, or .md for the markdown version of any page.
Digest
  • Getting Started
    • Quickstart
    • Introduction
    • Local Installation
    • Building from Source
    • Contribution Guide
  • Resources
    • Support Matrix
    • Feature Matrix
    • Release Artifacts
    • Examples
  • Kubernetes Deployment
    • Deployment Guide
  • User Guides
    • KV Cache Aware Routing
    • Disaggregated Serving
    • KV Cache Offloading
    • Dynamo Benchmarking
    • Multimodal
    • Diffusion (Preview)
    • Tool Calling
    • LoRA Adapters
    • Agents
    • Observability (Local)
    • Fault Tolerance
      • Request Migration
      • Request Cancellation
      • Request Rejection
      • Graceful Shutdown
      • Testing
    • Writing Python Workers
  • Backends
    • SGLang
    • TensorRT-LLM
    • vLLM
  • Components
    • Frontend
    • Router
    • Planner
    • Profiler
    • KVBM
  • Integrations
    • LMCache
    • SGLang HiCache
    • FlexKV
    • KV Events for Custom Engines
  • Design Docs
    • Overall Architecture
    • Architecture Flow
    • Disaggregated Serving
    • Distributed Runtime
    • Blog
  • Documentation
    • Dynamo Docs Guide
NVIDIANVIDIA
Developer-friendly docs for your API
Privacy Policy | Your Privacy Choices | Terms of Service | Accessibility | Corporate Policies | Product Security | Contact

Copyright © 2026, NVIDIA Corporation.

LogoLogoDocumentation
Digest
On this page
  • Overview
  • Test Directory Structure
  • Request Cancellation Tests
  • Running Cancellation Tests
  • Cancellation Test Utilities
  • CancellableRequest
  • send_completion_request / send_chat_completion_request
  • poll_for_pattern
  • Migration Tests
  • Running Migration Tests
  • Migration Test Utilities
  • Example Migration Test
  • etcd HA Tests
  • Running etcd HA Tests
  • Test Scenarios
  • Hardware Fault Injection
  • Fault Injection Service
  • Supported Fault Types
  • GPU Faults
  • Network Faults
  • Fault Injection API
  • Inject GPU Fault
  • Inject Specific XID Error
  • Inject Network Partition
  • Recover from Fault
  • List Active Faults
  • GPU Fault Injector Agent
  • Deployment Testing Framework
  • Test Phases
  • Scenario Configuration
  • Running Deployment Tests
  • Validation Checkers
  • Results Parsing
  • Client Utilities
  • Multi-Threaded Load Generation
  • Request Options
  • Running the Full Test Suite
  • Prerequisites
  • Environment Setup
  • Run All Tests
  • Test Markers
  • Best Practices
  • 1. Isolate Test Environments
  • 2. Clean Up After Tests
  • 3. Collect Logs
  • 4. Monitor During Tests
  • Related Documentation
User GuidesFault Tolerance

Testing

||View as Markdown|
Edit this page
Previous

Graceful Shutdown

Next

Writing Python Workers in Dynamo

This document describes the test infrastructure for validating Dynamo’s fault tolerance mechanisms. The testing framework supports request cancellation, migration, etcd HA, and hardware fault injection scenarios.

Overview

Dynamo’s fault tolerance test suite is located in tests/fault_tolerance/ and includes:

Test CategoryLocationPurpose
Cancellationcancellation/Request cancellation during in-flight operations
Migrationmigration/Request migration when workers fail
etcd HAetcd_ha/etcd failover and recovery
Hardwarehardware/GPU and network fault injection
Deploymentdeploy/End-to-end deployment testing

Test Directory Structure

tests/fault_tolerance/
├── cancellation/
│ ├── test_vllm.py
│ ├── test_trtllm.py
│ ├── test_sglang.py
│ └── utils.py
├── migration/
│ ├── test_vllm.py
│ ├── test_trtllm.py
│ ├── test_sglang.py
│ └── utils.py
├── etcd_ha/
│ ├── test_vllm.py
│ ├── test_trtllm.py
│ ├── test_sglang.py
│ └── utils.py
├── hardware/
│ └── fault_injection_service/
│ ├── api_service/
│ └── agents/
├── deploy/
│ ├── test_deployment.py
│ ├── scenarios.py
│ ├── base_checker.py
│ └── ...
└── client.py

Request Cancellation Tests

Test that in-flight requests can be properly canceled.

Running Cancellation Tests

$# Run all cancellation tests
$pytest tests/fault_tolerance/cancellation/ -v
$
$# Run for specific backend
$pytest tests/fault_tolerance/cancellation/test_vllm.py -v

Cancellation Test Utilities

The cancellation/utils.py module provides:

CancellableRequest

Thread-safe request cancellation via TCP socket manipulation:

1from tests.fault_tolerance.cancellation.utils import CancellableRequest
2
3request = CancellableRequest()
4
5# Send request in separate thread
6thread = Thread(target=send_request, args=(request,))
7thread.start()
8
9# Cancel after some time
10time.sleep(1)
11request.cancel() # Closes underlying socket

send_completion_request / send_chat_completion_request

Send cancellable completion requests:

1from tests.fault_tolerance.cancellation.utils import (
2 send_completion_request,
3 send_chat_completion_request
4)
5
6# Non-streaming
7response = send_completion_request(
8 base_url="http://localhost:8000",
9 model="Qwen/Qwen3-0.6B",
10 prompt="Hello, world!",
11 max_tokens=100
12)
13
14# Streaming with cancellation
15responses = send_chat_completion_request(
16 base_url="http://localhost:8000",
17 model="Qwen/Qwen3-0.6B",
18 messages=[{"role": "user", "content": "Hello!"}],
19 stream=True,
20 cancellable_request=request
21)

poll_for_pattern

Wait for specific patterns in logs:

1from tests.fault_tolerance.cancellation.utils import poll_for_pattern
2
3# Wait for cancellation confirmation
4found = poll_for_pattern(
5 log_file="/var/log/dynamo/worker.log",
6 pattern="Request cancelled",
7 timeout=30,
8 interval=0.5
9)

Migration Tests

Test that requests migrate to healthy workers when failures occur.

Running Migration Tests

$# Run all migration tests
$pytest tests/fault_tolerance/migration/ -v
$
$# Run for specific backend
$pytest tests/fault_tolerance/migration/test_vllm.py -v

Migration Test Utilities

The migration/utils.py module provides:

  • Frontend wrapper with configurable request planes
  • Long-running request spawning for migration scenarios
  • Health check disabling for controlled testing

Example Migration Test

1def test_migration_on_worker_failure():
2 # Start deployment with 2 workers
3 deployment = start_deployment(workers=2)
4
5 # Send long-running request
6 request_thread = spawn_long_request(max_tokens=1000)
7
8 # Kill one worker mid-generation
9 kill_worker(deployment.workers[0])
10
11 # Verify request completes on remaining worker
12 response = request_thread.join()
13 assert response.status_code == 200
14 assert len(response.tokens) > 0

etcd HA Tests

Test system behavior during etcd failures and recovery.

Running etcd HA Tests

$pytest tests/fault_tolerance/etcd_ha/ -v

Test Scenarios

  • Leader failover: etcd leader node fails, cluster elects new leader
  • Network partition: etcd node becomes unreachable
  • Recovery: System recovers after etcd becomes available

Hardware Fault Injection

The fault injection service enables testing under simulated hardware failures.

Fault Injection Service

Located at tests/fault_tolerance/hardware/fault_injection_service/, this FastAPI service orchestrates fault injection:

$# Start the fault injection service
$cd tests/fault_tolerance/hardware/fault_injection_service
$python -m api_service.main

Supported Fault Types

GPU Faults

Fault TypeDescription
XID_ERRORSimulate GPU XID error (various codes)
THROTTLEGPU thermal throttling
MEMORY_PRESSUREGPU memory exhaustion
OVERHEATGPU overheating condition
COMPUTE_OVERLOADGPU compute saturation

Network Faults

Fault TypeDescription
FRONTEND_WORKERPartition between frontend and workers
WORKER_NATSPartition between workers and NATS
WORKER_WORKERPartition between workers
CUSTOMCustom network partition

Fault Injection API

Inject GPU Fault

$curl -X POST http://localhost:8080/api/v1/faults/gpu/inject \
> -H "Content-Type: application/json" \
> -d '{
> "target_pod": "vllm-worker-0",
> "fault_type": "XID_ERROR",
> "severity": "HIGH"
> }'

Inject Specific XID Error

$# Inject XID 79 (GPU memory page fault)
$curl -X POST http://localhost:8080/api/v1/faults/gpu/inject/xid-79 \
> -H "Content-Type: application/json" \
> -d '{"target_pod": "vllm-worker-0"}'

Supported XID codes: 43, 48, 74, 79, 94, 95, 119, 120

Inject Network Partition

$curl -X POST http://localhost:8080/api/v1/faults/network/inject \
> -H "Content-Type: application/json" \
> -d '{
> "partition_type": "FRONTEND_WORKER",
> "duration_seconds": 30
> }'

Recover from Fault

$curl -X POST http://localhost:8080/api/v1/faults/{fault_id}/recover

List Active Faults

$curl http://localhost:8080/api/v1/faults

GPU Fault Injector Agent

The GPU fault injector runs as a DaemonSet on worker nodes:

1apiVersion: apps/v1
2kind: DaemonSet
3metadata:
4 name: gpu-fault-injector
5spec:
6 selector:
7 matchLabels:
8 app: gpu-fault-injector
9 template:
10 spec:
11 containers:
12 - name: agent
13 image: dynamo/gpu-fault-injector:latest
14 securityContext:
15 privileged: true
16 volumeMounts:
17 - name: dev
18 mountPath: /dev

The agent injects fake XID messages via /dev/kmsg to trigger NVSentinel detection.

Deployment Testing Framework

The deploy/ directory contains an end-to-end testing framework.

Test Phases

Tests run through three phases:

PhaseDescription
STANDARDBaseline performance under normal conditions
OVERFLOWSystem behavior during fault/overload
RECOVERYSystem recovery after fault resolution

Scenario Configuration

Define test scenarios in scenarios.py:

1from tests.fault_tolerance.deploy.scenarios import Scenario, Load, Failure
2
3scenario = Scenario(
4 name="worker_failure_migration",
5 backend="vllm",
6 load=Load(
7 clients=10,
8 requests_per_client=100,
9 max_tokens=256
10 ),
11 failure=Failure(
12 type="pod_kill",
13 target="vllm-worker-0",
14 trigger_after_requests=50
15 )
16)

Running Deployment Tests

$# Run all deployment tests
$pytest tests/fault_tolerance/deploy/test_deployment.py -v
$
$# Run specific scenario
$pytest tests/fault_tolerance/deploy/test_deployment.py::test_worker_failure -v

Validation Checkers

The framework includes pluggable validators:

1from tests.fault_tolerance.deploy.base_checker import BaseChecker, ValidationContext
2
3class MigrationChecker(BaseChecker):
4 def check(self, context: ValidationContext) -> bool:
5 # Verify migrations occurred
6 migrations = context.metrics.get("migrations_total", 0)
7 return migrations > 0

Results Parsing

Parse test results for analysis:

1from tests.fault_tolerance.deploy.parse_results import process_overflow_recovery_test
2
3results = process_overflow_recovery_test(log_dir="/path/to/logs")
4print(f"Success rate: {results['success_rate']}")
5print(f"P99 latency: {results['p99_latency_ms']}ms")

Client Utilities

The client.py module provides shared client functionality:

Multi-Threaded Load Generation

1from tests.fault_tolerance.client import client
2
3# Generate load with multiple clients
4results = client(
5 base_url="http://localhost:8000",
6 num_clients=10,
7 requests_per_client=100,
8 model="Qwen/Qwen3-0.6B",
9 max_tokens=256,
10 log_dir="/tmp/test_logs"
11)

Request Options

ParameterDescription
base_urlFrontend URL
num_clientsNumber of concurrent clients
requests_per_clientRequests per client
modelModel name
max_tokensMax tokens per request
log_dirDirectory for client logs
endpointcompletions or chat/completions

Running the Full Test Suite

Prerequisites

  1. Kubernetes cluster with GPU nodes
  2. Dynamo deployment
  3. etcd cluster (for HA tests)
  4. Fault injection service (for hardware tests)

Environment Setup

$export KUBECONFIG=/path/to/kubeconfig
$export DYNAMO_NAMESPACE=dynamo-test
$export FRONTEND_URL=http://localhost:8000

Run All Tests

$# Install test dependencies
$pip install pytest pytest-asyncio
$
$# Run all fault tolerance tests
$pytest tests/fault_tolerance/ -v --tb=short
$
$# Run with specific markers
$pytest tests/fault_tolerance/ -v -m "not slow"

Test Markers

MarkerDescription
slowLong-running tests (> 5 minutes)
gpuRequires GPU resources
k8sRequires Kubernetes cluster
etcd_haRequires multi-node etcd

Best Practices

1. Isolate Test Environments

Run fault tolerance tests in dedicated namespaces:

$kubectl create namespace dynamo-fault-test

2. Clean Up After Tests

Ensure fault injection is recovered:

$# List and recover all active faults
$curl http://localhost:8080/api/v1/faults | jq -r '.[].id' | \
> xargs -I {} curl -X POST http://localhost:8080/api/v1/faults/{}/recover

3. Collect Logs

Preserve logs for debugging:

$pytest tests/fault_tolerance/ -v \
> --log-dir=/tmp/fault_test_logs \
> --capture=no

4. Monitor During Tests

Watch system state during tests:

$# Terminal 1: Watch pods
$watch kubectl get pods -n dynamo-test
$
$# Terminal 2: Watch metrics
$watch 'curl -s localhost:8000/metrics | grep -E "(migration|rejection)"'

Related Documentation

  • Request Migration - Migration implementation details
  • Request Cancellation - Cancellation implementation
  • Health Checks - Health monitoring
  • Metrics - Available metrics for monitoring