Large Language Models (1.1.0)
Large Language Models (1.1.0)

Release Notes

Summary

This is the latest version of NIM.

Language Models

  • Llama 3 Swallow 70B Instruct V0.1

  • Llama 3 Taiwan 70B Instruct

    • Note: For the H100 TP8 FP8 Latency Profile, we have intermittently observed higher TTFT values at low concurrency values.

  • Llama 3.1 405B Instruct

    • Note: Due to the large size of this model, it is only supported on a subset of GPUs and optimization targets. See the Support Matrix for more details.

  • Mistral-NeMo-12B-Instruct

  • Nemotron 4 340B Instruct

New Features

  • Added support for vLLM fallback profiles for Llama 3.1 8B Base, Llama 3.1 8B Instruct, and Llama 3.1 70B Instruct

Known Issues

LoRA is not supported for Llama 3.1 405B Instruct

vLLM profiles are not supported for Llama 3.1 405B Instruct

Throughput optimized profiles are not supported on A100 FP16 and H100 FP16 for Llama 3.1 405B Instruct

Cache deployment fails for air-gapped system or read-only volume for multi-GPU vLLM profile
Users deploying a cache into an air-gapped system or read-only volume and intending to use the multi-GPU vLLM profile must create the following JSON file from the system used to initially download and generate the cache:

Copy
Copied!
            

echo '{ "0->0": false, "0->1": true, "1->0": true, "1->1": false }' > $NIM_CACHE_PATH/vllm/vllm/gpu_p2p_access_cache_for_0,1.json file

CUDA out of memory issue for Llama2 70b v1.0.3
The vllm-fp16-tp2 profile has been validated and is known to work on H100 x 2 and A100 x 2 configurations. Other types of GPUs might encounter a “CUDA out of memory” issue.

Llama 3.1 FP8 requires NVIDIA driver version >= 550

Summary

Removed incompatible vllm profiles for Llama 3.1 8B Base, Llama 3.1 8B Instruct, and Llama 3.1 70B Instruct

Known Issues

  • vLLM profiles are not supported for Llama 3.1 8B Base, Llama 3.1 8B Instruct, and Llama 3.1 70B Instruct

Summary

This is an update of NIM.

Language Models

  • Llama 3.1 8B Base

  • Llama 3.1 8B Instruct

  • Llama 3.1 70B Instruct

New Features

Known Issues

  • vLLM profiles for Llama 3.1 models will fail with ValueError: Unknown RoPE scaling type extended.

  • NIM does not support Multi-instance GPU mode (MIG).

  • Release notes for Release 1.0 are located in the 1.0 documentation.

Previous Introduction
Next Getting Started
© Copyright © 2024, NVIDIA Corporation. Last updated on Sep 9, 2024.