Release Notes

Release 1.0.0

Summary

This is the first general release of NVIDIA NIM for LLMs.

Empty metrics values on multi-GPU TensorRT-LLM model Metrics items gpu_cache_usage_perc, num_requests_running, and num_requests_waiting will not be reported for multi-GPU TensorRT-LLM model, because TensorRT-LLM currently doesn’t expose iteration statistics in orchestrator mode.

No tokenizer found error when running PEFT This is a warning that should be ignored and will be removed in a future release.

Previous NVIDIA NIM for Large Language Models

Next Getting Started

Release Notes

Release 1.0.0

Summary

Language Models

Known Issues