For AI agents: a documentation index is available at the root level at /llms.txt and /llms-full.txt. Append /llms.txt to any URL for a page-level index, or .md for the markdown version of any page.
Digest
  • Getting Started
    • Quickstart
    • Installation
    • Support Matrix
    • Feature Matrix
    • Examples
  • Kubernetes Deployment
  • User Guides
    • Tool Calling
    • Multimodality Support
    • Finding Best Initial Configs
    • Dynamo Benchmarking Guide
    • Tuning Disaggregated Performance
    • Writing Python Workers in Dynamo
    • Glossary
  • Components
    • Router
      • Overview
      • Motivation
      • Architecture
      • Components
      • Design Deep Dive
      • Integrations
      • KVBM in vLLM
      • KVBM in TRTLLM
      • LMCache Integration
      • Further Reading
  • Design Docs
    • Overall Architecture
    • Architecture Flow
    • Disaggregated Serving
    • Distributed Runtime
NVIDIANVIDIA
Developer-friendly docs for your API
Privacy Policy | Your Privacy Choices | Terms of Service | Accessibility | Corporate Policies | Product Security | Contact

Copyright © 2026, NVIDIA Corporation.

LogoLogoDocumentation
Digest
On this page
  • Typical KVBM Integrations
  • How to run KVBM with Frameworks
  • Onboarding
  • Offloading
ComponentsKVBM

KVBM Integrations

||View as Markdown|
Previous

KVBM components

Next

Running KVBM in vLLM

KVBM Integrates with Inference frameworks (vLLM, TRTLLM, SGLang) via Connector APIs to influence KV caching behaviour, scheduling, and forward pass execution. There are two components of the interface, Scheduler and Worker. Scheduler(leader) is responsible for the orchestration of KV block offload/onboard, builds metadata specifying transfer data to the workers. It also maintains hooks for handling asynchronous transfer completion. Worker is responsible for reading metadata built by the scheduler(leader), does async onboarding/ offloading at the end of the forward pass.

Typical KVBM Integrations

The following figure shows the typical integration of KVBM with inference frameworks (vLLM used as an example)

vLLM KVBM Integration vLLM KVBM Integration

How to run KVBM with Frameworks

  • Instructions to run KVBM in vLLM
  • Instructions to run KVBM with TRTLLM

Onboarding

Onboarding blocks from Host to Device Onboarding blocks from Host to Device Onboarding blocks from Disk to Device Onboarding blocks from Disk to Device

Offloading

Offloading blocks from Device to Host&Disk Offloading blocks from Device to Host&Disk