For AI agents: a documentation index is available at the root level at /llms.txt and /llms-full.txt. Append /llms.txt to any URL for a page-level index, or .md for the markdown version of any page.
Digest
  • Getting Started
    • Quickstart
    • Installation
    • Support Matrix
    • Feature Matrix
    • Examples
  • Kubernetes Deployment
  • User Guides
    • Tool Calling
    • Multimodality Support
    • Finding Best Initial Configs
    • Dynamo Benchmarking Guide
    • Tuning Disaggregated Performance
    • Writing Python Workers in Dynamo
    • Glossary
  • Components
    • Router
      • Overview
      • Motivation
      • Architecture
      • Components
      • Design Deep Dive
      • Integrations
      • KVBM in vLLM
      • KVBM in TRTLLM
      • LMCache Integration
      • Further Reading
  • Design Docs
    • Overall Architecture
    • Architecture Flow
    • Disaggregated Serving
    • Distributed Runtime
NVIDIANVIDIA
Developer-friendly docs for your API
Privacy Policy | Your Privacy Choices | Terms of Service | Accessibility | Corporate Policies | Product Security | Contact

Copyright © 2026, NVIDIA Corporation.

LogoLogoDocumentation
Digest
On this page
  • Feature Support Matrix
ComponentsKVBM

KV Block Manager

||View as Markdown|

The Dynamo KV Block Manager (KVBM) is a scalable runtime component designed to handle memory allocation, management, and remote sharing of Key-Value (KV) blocks for inference tasks across heterogeneous and distributed environments. It acts as a unified memory layer for frameworks like vLLM, SGLang, and TRT-LLM.

It offers:

  • A unified memory API that spans GPU memory (in future), pinned host memory, remote RDMA-accessible memory, local or distributed pool of SSDs and remote file/object/cloud storage systems.
  • Support for evolving block lifecycles (allocate → register → match) with event-based state transitions that storage can subscribe to.
  • Integration with NIXL, a dynamic memory exchange layer used for remote registration, sharing, and access of memory blocks over RDMA/NVLink.

The Dynamo KV Block Manager serves as a reference implementation that emphasizes modularity and extensibility. Its pluggable design enables developers to customize components and optimize for specific performance, memory, and deployment needs.

Feature Support Matrix

CategoryStatusFeature
Backend✅Local
✅Kubernetes
LLM Framework✅vLLM
✅TensorRT-LLM
❌SGLang
Serving Type✅Aggregated
✅Disaggregated
Previous

SLA-based Planner

Next

Motivation behind KVBM