Release Notes for NIM Turbo#
This page lists updates, fixes, and known issues for NIM Turbo releases.
Updates#
The following VLM NIM is now available:
The following LLM NIM is now available:
Known Issues#
This release includes the following known issues and limitations:
-
This turbo NIM is tuned for high throughput per GPU at a concurrency of 256.
The B200 profile supports the NVFP4 precision only. INT4 is not supported.
The
NIM_MANIFEST_ALLOW_UNSAFEenvironment variable is not supported.The
/v1/responseendpoint is not available in this NIM.On the
/v1/completionsendpoint, thestructured_outputs.choicefield does not strictly enforce constrained outputs.
-
This turbo NIM is tuned for high throughput per GPU at a concurrency of 256.
To use this NIM at concurrency <= 64, you should use MTP by setting
NIM_NUM_SPECULATIVE_TOKENS=2. To use this NIM at concurrency > 64, do not use MTP for optimal performance.