SGLang Release 26.06

This SGLang container release is intended for use on the NVIDIA® Hopper Architecture GPU, NVIDIA H100, the NVIDIA® Ampere Architecture GPU, NVIDIA A100, and the associated NVIDIA CUDA® 12 and NVIDIA cuDNN 9 libraries. The NVIDIA container image for the SGLang release is available on NGC.

Contents of the SGLang container

This container image contains the complete source of the version of SGLang in /opt/sglang. It is pre-built and installed in the Python default environment/usr/local/lib/python3.12/dist-packages/sglang/in the container image. Visit SGLang Docs to learn more about SGLang.

The NVIDIA SGLang Container is optimized for use with NVIDIA GPUs, and contains the following software for GPU acceleration.

Please see the CUDA section for the list of libraries inherited from CUDA container.
NVIDIA CUDA 13.3.0
SGLang 0.5.12.post1
flashinfer 0.6.12
transformers 5.6.0
flash-attention 2.7.4.post1
xgrammar 0.2.0
Torch 2.13.0a0+8145d630e8

Driver Requirements

Release 26.06 is based on CUDA 13.3.0. For comprehensive and up-to-date driver compatibility information, please refer to the following documentation:

NVIDIA CUDA Compatibility Guide - Compatibility information between CUDA versions and driver releases
CUDA Toolkit Release Notes - Driver version requirements and compatibility matrices
NVIDIA Drivers Download - Latest NVIDIA drivers.

Key Features and Enhancements

This SGLang release includes the following key features and enhancements.

Support for multi-node configurations.
GB300/B300 support.
RTX PRO™ 6000 Blackwell Server Edition support.
DGX Spark support.
Jetson Thor support.
Support for 8-bit floating point (FP8) precision on Hopper GPUs and above.
Support NVIDIA innovative 4-bit floating point NVFP4 format on Blackwell GPUs (including Jetson Thor and DGX Spark), which provides better training and inference performance with lower memory utilization.
Supported for DeepSeek-R1, Llama-3.1-8B-Instruct.
Support for openai/gpt-oss-20b and openai/gpt-oss-120b.
Support for Nemotron-3 Nano Omni
Qwen3.6-35B-A3B-FP8

Announcements

None.

Known Issues

MTP is not supported for NVIDIA-Nemotron-3-Super models.
There is a known issue with Phi 4 Multimodal Instruct FP8.
The 26.06 SGLang container release includes no known vulnerabilities (CVEs).