Release Notes#
TensorRT-Cloud CLI
Minor improvements will be continuously pushed without the expectation that you will need to upgrade.
Major changes can be expected at a monthly cadence with the expectation that you will upgrade your version of the CLI.
Until we release TensorRT-Cloud CLI 1.0, expect some API-breaking changes with new releases.
TensorRT-Cloud
Minor improvements will be continuously pushed to production to provide enhancements as soon as possible.
Major API-breaking changes will be announced clearly in the release notes. We expect to make API-breaking changes as we receive feedback from EA customers. We expect you to upgrade to a newer version of the CLI on an API-breaking change.
Backward compatibility will be considered for support for GA.
TensorRT-Cloud 0.5.3 Early Access (EA)#
Fixed Issues
The following issues have been fixed in this release:
Fixed a 504 error response when building large engines (~10 GB or larger).
An explicit list of supported Hugging Face repos was added to the documentation (refer to the Building a TensorRT-LLM Engine section).
Limitations
Weight-stripped on-demand LLM engine building is only supported for checkpoint inputs.
Refit requires a GPU of the same SM version used to build the engine. (This is a TensorRT limitation.)
By default, weight-stripped engines must be refitted with the original ONNX weights. Only engines built with the
--refit
flag in thetrtexec
arg list may be refitted with arbitrary weights.Fully refittable engines might have some performance degradation.
Custom plugins or any custom ops are not supported. Only built-in TensorRT ops and plugins will work.
Input ONNX models must come from one of the following:
S3
GitHub
Local machine
Invalid
trt-llm
engine build configs may fail when setting--tp-size > 1
for too small GPUs.
For inquiries and to report issues, contact tensorrt-cloud-contact@nvidia.com.
TensorRT-Cloud 0.5.2 Early Access (EA)#
Key Features and Enhancements
The following features and enhancements have been added to this release:
Support for TensorRT-LLM version 0.16 was added.
Added live progress updates for monitoring build jobs.
Breaking API Changes
Support for TensorRT-LLM version 0.12 was removed.
Fixed Issues
The following issues have been fixed in this release:
Fixed bugs where some Windows TensorRT-LLM engine builds were incorrectly marked as failed.
Limitations
Weight-stripped on-demand LLM engine building is only supported for checkpoint inputs.
Refit requires a GPU of the same SM version used to build the engine. (This is a TensorRT limitation.)
By default, weight-stripped engines must be refitted with the original ONNX weights. Only engines built with the
--refit
flag in thetrtexec
arg list may be refitted with arbitrary weights.Fully refittable engines might have some performance degradation.
Custom plugins or any custom ops are not supported. Only built-in TensorRT ops and plugins will work.
Input ONNX models must come from one of the following:
S3
GitHub
Local machine
Invalid
trt-llm
engine build configs may fail when setting--tp-size > 1
for too small GPUs.
For inquiries and to report issues, contact tensorrt-cloud-contact@nvidia.com.
TensorRT-Cloud 0.5.1 Early Access (EA)#
Key Features and Enhancements
The following features and enhancements have been added to this release:
Support for TensorRT versions 10.6 and 10.7 was added.
Support for TensorRT-LLM versions 0.14 and 0.15 was added.
Support for more Llama, Gemma, and Mistral variants was added.
Updated EULA for building TensorRT and TensorRT-LLM engines.
Breaking API Changes
Support for TensorRT-LLM version 0.11 was removed.
Limitations
Weight-stripped on-demand LLM engine building is only supported for checkpoint inputs.
Refit requires a GPU of the same SM version used to build the engine. (This is a TensorRT limitation.)
By default, weight-stripped engines must be refitted with the original ONNX weights. Only engines built with the
--refit
flag in thetrtexec
arg list may be refitted with arbitrary weights.Fully refittable engines might have some performance degradation.
Custom plugins or any custom ops are not supported. Only built-in TensorRT ops and plugins will work.
Input ONNX models must come from one of the following:
S3
GitHub
Local machine
Invalid
trt-llm
engine build configs may fail when setting--tp-size > 1
for too small GPUs.
For inquiries and to report issues, contact tensorrt-cloud-contact@nvidia.com.
TensorRT-Cloud 0.5.0 Early Access (EA)#
Key Features and Enhancements
The following features and enhancements have been added to this release:
Added multi-version support for TensorRT-LLM.
Can now build using TensorRT-LLM version 0.11 or 0.12.
Added support for TensorRT versions 10.4 and 10.5.
Limitations
Return types with metrics are currently unsupported for checkpoint inputs with on-demand TensorRT-LLM builds.
Weight-stripped on-demand LLM engine building is only supported for checkpoint inputs.
Refit requires a GPU of the same SM version used to build the engine. (This is a TensorRT limitation.)
By default, weight-stripped engines must be refitted with the original ONNX weights. Only engines built with the
--refit
flag in thetrtexec
arg list may be refitted with arbitrary weights.Fully refittable engines might have some performance degradation.
Custom plugins or any custom ops are not supported. Only built-in TensorRT ops and plugins will work.
Input ONNX models must come from one of the following:
S3
GitHub
Local machine
Invalid
trt-llm
engine build configs may fail when setting--tp-size > 1
for too small GPUs.
For inquiries and to report issues, contact tensorrt-cloud-contact@nvidia.com.