Release Notes for NIM Turbo#

This page lists updates, fixes, and known issues for NIM Turbo releases.

Updates#

The following VLM NIM is now available:

The following LLM NIM is now available:

Known Issues#

This release includes the following known issues and limitations:

  • Kimi-K2.5

    • This turbo NIM is tuned for high throughput per GPU at a concurrency of 256.

    • The B200 profile supports the NVFP4 precision only. INT4 is not supported.

    • The NIM_MANIFEST_ALLOW_UNSAFE environment variable is not supported.

    • The /v1/response endpoint is not available in this NIM.

    • On the /v1/completions endpoint, the structured_outputs.choice field does not strictly enforce constrained outputs.

  • Nemotron 3 Super 120B

    • This turbo NIM is tuned for high throughput per GPU at a concurrency of 256.

    • To use this NIM at concurrency <= 64, you should use MTP by setting NIM_NUM_SPECULATIVE_TOKENS=2. To use this NIM at concurrency > 64, do not use MTP for optimal performance.