For AI agents: a documentation index is available at the root level at /llms.txt and /llms-full.txt. Append /llms.txt to any URL for a page-level index, or .md for the markdown version of any page.
Digest
  • Getting Started
    • Quickstart
    • Introduction
    • Local Installation
    • Building from Source
    • Kubernetes Deployment
    • Contribution Guide
  • Resources
    • Support Matrix
    • Feature Matrix
    • Release Artifacts
    • Examples
    • Glossary
  • Digest
    • Dynamo Day 0 support for TokenSpeed
    • Multi-Turn Agentic Harnesses
    • Full-Stack Optimizations for Agentic Inference
    • Flash Indexer: Inter-Galactic KV Routing
  • Kubernetes Deployment
  • User Guides
    • Disaggregated Serving
    • KV Cache Aware Routing
    • KV Cache Offloading
    • Tool Calling
    • Reasoning
    • Agents
    • Multimodal
    • Diffusion
    • LoRA Adapters
    • Observability (Local)
    • Fault Tolerance
    • Benchmarking
    • Writing Python Workers
  • Backends
    • SGLang
    • TensorRT-LLM
    • vLLM
  • Components
    • Frontend
    • Router
    • Planner
    • Profiler
    • KVBM
  • Integrations
    • LMCache
    • SGLang HiCache
    • FlexKV
    • KV Events for Custom Engines
  • Design Docs
    • Overall Architecture
    • Architecture Flow
    • Disaggregated Serving
    • Distributed Runtime
  • Documentation
    • Dynamo Docs Guide
NVIDIANVIDIA
Developer-friendly docs for your API
Privacy Policy | Your Privacy Choices | Terms of Service | Accessibility | Corporate Policies | Product Security | Contact

Copyright © 2026, NVIDIA Corporation.

LogoLogoDocumentation
Digest
Digest

Dynamo Day 0 support for TokenSpeed

A short launch note for running TokenSpeed with Dynamo — May 2026

||View as Markdown|

TokenSpeed (GitHub) launched today as LightSeek’s new inference engine for agentic workloads. The initial repo is a preview, with more model coverage and runtime features landing over the next few weeks.

Two pieces are worth calling out. First, TokenSpeed includes new MLA kernel work for long-context Kimi-style workloads on Blackwell. Second, TokenSpeed has a native C++ scheduler in tokenspeed-scheduler/ that models request flow and cache operations as explicit state machines, while Python remains the runtime and integration layer.

Dynamo now has day-0 support for running TokenSpeed as a Dynamo backend through python -m dynamo.tokenspeed. The Dynamo frontend remains the user-facing OpenAI-compatible API entrypoint and handles request routing, streaming responses, and cancellation.

See the Kimi K2.5 TokenSpeed recipe for the current Dynamo launch recipe.

Things are moving quickly. Upstream TokenSpeed calls out ongoing work on model coverage, P/D, EPLB, KV store, Mamba cache, VLM, metrics, Hopper optimization, and related runtime features.

Last updated May 6, 2026

Previous

Dynamo Digest

Next

Multi-Turn Agentic Harnesses