Release Notes
Summary
This is the latest release of NIM.
Language Models
Llama 3.1 8B Base
Llama 3.1 8B Instruct
Llama 3.1 70B Instruct
New Features
Chunk pre-fill
Experimental support for LS API
Known Issues
vLLM is not currently supported on Llama 3.1 models.
NIM does not support Multi-instance GPU mode (MIG).
Summary
This is the first general release of NIM.
Language Models
Llama 3 8B Instruct
Llama 3 70B Instruct
Mistral-7B-Instruct-v0.3
Mixtral-8x7B-v0.1
Mixtral-8x22B-v0.1
Known Issues
P-Tuning is not supported.
Empty metrics values on multi-GPU TensorRT-LLM model Metrics items gpu_cache_usage_perc
, num_requests_running
, and num_requests_waiting
will not be reported for multi-GPU TensorRT-LLM model, because TensorRT-LLM currently doesn’t expose iteration statistics in orchestrator mode.
No tokenizer found
error when running PEFT This warning can be safely ignored.