Release Notes#

Release 2.0.4-variant#

This release contains model updates outlined in the following sections.

Kimi-K2.6#

This is an updated release of Kimi-K2.6. For more information on this model, refer to the model card.

For GPU support, refer to the support matrix for Kimi-K2.6.

Note the following limitations:

  • Kimi-K2.6

    • Only the INT4 precision profile is supported. BF16 and FP8 are not provided.

    • First-time container start downloads a ~554 GB NGC artifact; allow ~60-90 min for cold start on a fast NVMe cache. Subsequent restarts reuse the local cache.

    • Requests to the /v1/completions endpoint with a blank prompt return an empty text field. Use the /v1/chat/completions endpoint instead.

    • A request to the /v1/chat/completions endpoint that includes an empty structured_outputs.json schema (“”) may crash the underlying vLLM engine and terminate the container.

    • list-model-profiles may classify a profile as runnable on SKUs not listed in the support matrix; deployment may fail with a CUDA out-of-memory error.

For information about past updates and older versions, refer to the previous release notes.