NVIDIA Optimized Frameworks

Release 25.10

This SGLang container release is intended for use on the NVIDIA® Hopper Architecture GPU, NVIDIA H100, the NVIDIA® Ampere Architecture GPU, NVIDIA A100, and the associated NVIDIA CUDA® 12 and NVIDIA cuDNN 9 libraries.

Driver Requirements

Release 25.10 is based on CUDA 13.0.2.006 which requires NVIDIA Driver release 570 or later. However, if you are running on a data center GPU (for example, B100, L40, or any other data center GPU), you can use NVIDIA driver release 470.57 (or later R470), 525.85 (or later R525), 535.86 (or later R535), or 550.54 (or later R550) in forward-compatibility mode.

The CUDA driver's compatibility package only supports particular drivers. Thus, users should upgrade from all R418, R440, R450, R460, R510, R520, R530, R545 and R555 and R560 drivers, which are not forward-compatible with CUDA 12.8. For a complete list of supported drivers, see the CUDA Application Compatibility topic. For more information, see CUDA Compatibility and Upgrades.

Contents of the container

This container image contains the complete source of the version of SGLang in /opt/sglang. It is pre-built and installed in the Python default environment/usr/local/lib/python3.12/dist-packages/sglang/in the container image. Visit SGLang Docs to learn more about SGLang.

The NVIDIA SGLang Container is optimized for use with NVIDIA GPUs, and contains the following software for GPU acceleration.

  • Please refer to CUDA section for the list of libraries inherited from CUDA container.
  • SGLang 0.5.3rc1
  • flashinfer 0.4.0
  • transformers 4.56.1
  • flash-attention 2.7.4
  • xgrammar 0.1.24
  • NVIDIA PyTorch 25.10

Driver Requirements

Release 25.10 is based on CUDA 13.0. For comprehensive and up-to-date driver compatibility information, please refer to the following documentation:

Key Features and Enhancements

This SGLang release includes the following key features and enhancements.

  • Compatibility with CUDA 13.0

  • Support for multi-node configurations.

  • GB300/B300 support.
  • RTX PRO™ 6000 Blackwell Server Edition support.
  • DGX Spark support.

  • Jetson Thor support.
  • Support for 8-bit floating point (FP8) precision on Hopper GPUs and above.
  • Support NVIDIA innovative 4-bit floating point NVFP4 format on Blackwell GPUs (including Jetson Thor and DGX Spark), which provides better training and inference performance with lower memory utilization.

  • Supported for DeepSeek-R1, Llama-3.1-8B-Instruct.
  • Support for openai/gpt-oss-20b and openai/gpt-oss-120b.

Announcements

  • 25.10 is the first NVIDIA SGLang container release that brings optimizations for NVIDIA GPUs.

Known Issues

  • gpt-oss family models cannot run on DGX Spark and Jetson Thor due to a OpenAI Triton issue.
  • FP8 models are failing on Thor.

NVIDIA SGLang Container Versions

The SGLang container supports the same version of Ubuntu and CUDA as the PyTorch container.

Known Issues

© Copyright 2025, NVIDIA. Last updated on Oct 29, 2025.