Release Notes#

This page lists changes, fixes, and known issues for each NIM LLM early access (EA) release.

Updates#

The following VLM NIMs are now available:

The following LLM NIMs are now available:

Dynamo recipes are now available on GitHub for the following NIMs:

This release includes the following known issues and limitations:

Kimi-K2.5
- This turbo NIM is tuned for high throughput per GPU at a concurrency of 256.
- The B200 profile supports the NVFP4 precision only. INT4 is not supported.
- The NIM_MANIFEST_ALLOW_UNSAFE environment variable is not supported.
- The /v1/response endpoint is not available in this NIM.
- On the /v1/completions endpoint, the structured_outputs.choice field does not strictly enforce constrained outputs.
Nemotron-3-Super-120B-A12B
- This turbo NIM is tuned for high throughput per GPU at a concurrency of 256.
- To use this NIM at concurrency <= 64, you should use MTP by setting NIM_NUM_SPECULATIVE_TOKENS=2. To use this NIM at concurrency > 64, do not use MTP for optimal performance.
GPT-OSS-120b-Turbo
- This turbo NIM provides workload-specific profiles for RAG (ISL 2,400, OSL 1,000, SSP 5,600) and agentic (ISL 6,400, OSL 400, SSP 57,600) workloads on B200 and H200 GPUs.