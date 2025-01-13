Release Notes#
Release 1.1.2#
Summary#
This is the latest version of NIM.
Language Models#
Llama 3 Swallow 70B Instruct V0.1
Llama 3 Taiwan 70B Instruct
Note: For the H100 TP8 FP8 Latency Profile, we have intermittently observed higher TTFT values at low concurrency values.
-
Llama 3.1 405B Instruct
Note: Due to the large size of this model, it is only supported on a subset of GPUs and optimization targets. See the Support Matrix for more details.
-
Mistral-NeMo-12B-Instruct
Nemotron 4 340B Instruct
New Features#
Added support for vLLM fallback profiles for Llama 3.1 8B Base, Llama 3.1 8B Instruct, and Llama 3.1 70B Instruct
Known Issues#
LoRA is not supported for Llama 3.1 405B Instruct
vLLM profiles are not supported for Llama 3.1 405B Instruct
Throughput optimized profiles are not supported on A100 FP16 and H100 FP16 for Llama 3.1 405B Instruct
Cache deployment fails for air-gapped system or read-only volume for multi-GPU vLLM profile
Users deploying a cache into an air-gapped system or read-only volume and intending to use the multi-GPU vLLM profile must create the following JSON file from the system used to initially download and generate the cache:
echo '{
"0->0": false,
"0->1": true,
"1->0": true,
"1->1": false
}' > $NIM_CACHE_PATH/vllm/vllm/gpu_p2p_access_cache_for_0,1.json file
CUDA out of memory issue for Llama2 70b v1.0.3
The
vllm-fp16-tp2 profile has been validated and is known to work on H100 x 2 and A100 x 2 configurations. Other types of GPUs might encounter a “CUDA out of memory” issue.
Llama 3.1 FP8 requires NVIDIA driver version >= 550
Release 1.1.1#
Summary#
Removed incompatible vllm profiles for Llama 3.1 8B Base, Llama 3.1 8B Instruct, and Llama 3.1 70B Instruct
Known Issues#
vLLM profiles are not supported for Llama 3.1 8B Base, Llama 3.1 8B Instruct, and Llama 3.1 70B Instruct
Release 1.1.0#
Summary#
This is an update of NIM.
Language Models#
Llama 3.1 8B Base
Llama 3.1 8B Instruct
Llama 3.1 70B Instruct
New Features#
Chunked pre-fill
Experimental support for Llama Stack API
Known Issues#
vLLM profiles for Llama 3.1 models will fail with
ValueError: Unknown RoPE scaling type extended.
NIM does not support Multi-instance GPU mode (MIG).
Release 1.0#
Release notes for Release 1.0 are located in the 1.0 documentation.