Release Notes#

TensorRT-Cloud CLI

Minor improvements will be continuously pushed without the expectation that you will need to upgrade.

Major changes can be expected at a monthly cadence with the expectation that you will upgrade your version of the CLI.

Until we release TensorRT-Cloud CLI 1.0, expect some API-breaking changes with new releases.

TensorRT-Cloud

Minor improvements will be continuously pushed to production to provide enhancements as soon as possible.

Major API-breaking changes will be announced clearly in the release notes. We expect to make API-breaking changes as we receive feedback from EA customers. We expect you to upgrade to a newer version of the CLI on an API-breaking change.

Backward compatibility will be considered for support for GA.

TensorRT-Cloud 0.5.3 Early Access (EA)#

Fixed Issues

The following issues have been fixed in this release:

Fixed a 504 error response when building large engines (~10 GB or larger).
An explicit list of supported Hugging Face repos was added to the documentation (refer to the Building a TensorRT-LLM Engine section).

Limitations

Weight-stripped on-demand LLM engine building is only supported for checkpoint inputs.
Refit requires a GPU of the same SM version used to build the engine. (This is a TensorRT limitation.)
By default, weight-stripped engines must be refitted with the original ONNX weights. Only engines built with the --refit flag in the trtexec arg list may be refitted with arbitrary weights.
Fully refittable engines might have some performance degradation.
Custom plugins or any custom ops are not supported. Only built-in TensorRT ops and plugins will work.
Input ONNX models must come from one of the following:
- S3
- GitHub
- Local machine
Invalid trt-llm engine build configs may fail when setting --tp-size > 1 for too small GPUs.

For inquiries and to report issues, contact tensorrt-cloud-contact@nvidia.com.

TensorRT-Cloud 0.5.2 Early Access (EA)#

Key Features and Enhancements

The following features and enhancements have been added to this release:

Support for TensorRT-LLM version 0.16 was added.
Added live progress updates for monitoring build jobs.

Breaking API Changes

Support for TensorRT-LLM version 0.12 was removed.

Fixed Issues

The following issues have been fixed in this release:

Fixed bugs where some Windows TensorRT-LLM engine builds were incorrectly marked as failed.

Limitations

Weight-stripped on-demand LLM engine building is only supported for checkpoint inputs.
Refit requires a GPU of the same SM version used to build the engine. (This is a TensorRT limitation.)
By default, weight-stripped engines must be refitted with the original ONNX weights. Only engines built with the --refit flag in the trtexec arg list may be refitted with arbitrary weights.
Fully refittable engines might have some performance degradation.
Custom plugins or any custom ops are not supported. Only built-in TensorRT ops and plugins will work.
Input ONNX models must come from one of the following:
- S3
- GitHub
- Local machine
Invalid trt-llm engine build configs may fail when setting --tp-size > 1 for too small GPUs.

For inquiries and to report issues, contact tensorrt-cloud-contact@nvidia.com.

TensorRT-Cloud 0.5.1 Early Access (EA)#

Key Features and Enhancements

The following features and enhancements have been added to this release:

Support for TensorRT versions 10.6 and 10.7 was added.
Support for TensorRT-LLM versions 0.14 and 0.15 was added.
Support for more Llama, Gemma, and Mistral variants was added.
Updated EULA for building TensorRT and TensorRT-LLM engines.

Breaking API Changes

Support for TensorRT-LLM version 0.11 was removed.

Limitations

Weight-stripped on-demand LLM engine building is only supported for checkpoint inputs.
Refit requires a GPU of the same SM version used to build the engine. (This is a TensorRT limitation.)
By default, weight-stripped engines must be refitted with the original ONNX weights. Only engines built with the --refit flag in the trtexec arg list may be refitted with arbitrary weights.
Fully refittable engines might have some performance degradation.
Custom plugins or any custom ops are not supported. Only built-in TensorRT ops and plugins will work.
Input ONNX models must come from one of the following:
- S3
- GitHub
- Local machine
Invalid trt-llm engine build configs may fail when setting --tp-size > 1 for too small GPUs.

For inquiries and to report issues, contact tensorrt-cloud-contact@nvidia.com.

TensorRT-Cloud 0.5.0 Early Access (EA)#

Key Features and Enhancements

The following features and enhancements have been added to this release:

Added multi-version support for TensorRT-LLM.
- Can now build using TensorRT-LLM version 0.11 or 0.12.
Added support for TensorRT versions 10.4 and 10.5.

Limitations

Return types with metrics are currently unsupported for checkpoint inputs with on-demand TensorRT-LLM builds.
Weight-stripped on-demand LLM engine building is only supported for checkpoint inputs.
Refit requires a GPU of the same SM version used to build the engine. (This is a TensorRT limitation.)
By default, weight-stripped engines must be refitted with the original ONNX weights. Only engines built with the --refit flag in the trtexec arg list may be refitted with arbitrary weights.
Fully refittable engines might have some performance degradation.
Custom plugins or any custom ops are not supported. Only built-in TensorRT ops and plugins will work.
Input ONNX models must come from one of the following:
- S3
- GitHub
- Local machine
Invalid trt-llm engine build configs may fail when setting --tp-size > 1 for too small GPUs.

For inquiries and to report issues, contact tensorrt-cloud-contact@nvidia.com.