Release Notes#
Release 2.0.4-variant#
This release contains model updates outlined in the following sections.
Kimi-K2.6#
This is an updated release of Kimi-K2.6. For more information on this model, refer to the model card.
For GPU support, refer to the support matrix for Kimi-K2.6.
Note the following limitations:
Kimi-K2.6
Only the INT4 precision profile is supported. BF16 and FP8 are not provided.
First-time container start downloads a ~554 GB NGC artifact; allow ~60-90 min for cold start on a fast NVMe cache. Subsequent restarts reuse the local cache.
Requests to the
/v1/completionsendpoint with a blank prompt return an emptytextfield. Use the/v1/chat/completionsendpoint instead.A request to the
/v1/chat/completionsendpoint that includes an emptystructured_outputs.jsonschema (“”) may crash the underlying vLLM engine and terminate the container.list-model-profilesmay classify a profile as runnable on SKUs not listed in the support matrix; deployment may fail with a CUDA out-of-memory error.
For information about past updates and older versions, refer to the previous release notes.