Release Notes#

TensorRT-Cloud CLI

Minor improvements will be continuously pushed without the expectation that you will need to upgrade.

Major changes can be expected at a monthly cadence with the expectation that you will upgrade your version of the CLI.

Until we release TensorRT-Cloud CLI 1.0, expect some API-breaking changes with new releases.

TensorRT-Cloud

Minor improvements will be continuously pushed to production to provide enhancements as soon as possible.

Major API-breaking changes will be announced clearly in the release notes. We expect to make API-breaking changes as we receive feedback from EA customers. We expect you to upgrade to a newer version of the CLI on an API-breaking change.

Backward compatibility will be considered for support for GA.

TensorRT-Cloud 0.5.3 Early Access (EA)#

Fixed Issues

The following issues have been fixed in this release:

  • Fixed a 504 error response when building large engines (~10 GB or larger).

  • An explicit list of supported Hugging Face repos was added to the documentation (refer to the Building a TensorRT-LLM Engine section).

Limitations

  • Weight-stripped on-demand LLM engine building is only supported for checkpoint inputs.

  • Refit requires a GPU of the same SM version used to build the engine. (This is a TensorRT limitation.)

  • By default, weight-stripped engines must be refitted with the original ONNX weights. Only engines built with the --refit flag in the trtexec arg list may be refitted with arbitrary weights.

  • Fully refittable engines might have some performance degradation.

  • Custom plugins or any custom ops are not supported. Only built-in TensorRT ops and plugins will work.

  • Input ONNX models must come from one of the following:

    • S3

    • GitHub

    • Local machine

  • Invalid trt-llm engine build configs may fail when setting --tp-size > 1 for too small GPUs.

For inquiries and to report issues, contact tensorrt-cloud-contact@nvidia.com.

TensorRT-Cloud 0.5.2 Early Access (EA)#

Key Features and Enhancements

The following features and enhancements have been added to this release:

  • Support for TensorRT-LLM version 0.16 was added.

  • Added live progress updates for monitoring build jobs.

Breaking API Changes

  • Support for TensorRT-LLM version 0.12 was removed.

Fixed Issues

The following issues have been fixed in this release:

  • Fixed bugs where some Windows TensorRT-LLM engine builds were incorrectly marked as failed.

Limitations

  • Weight-stripped on-demand LLM engine building is only supported for checkpoint inputs.

  • Refit requires a GPU of the same SM version used to build the engine. (This is a TensorRT limitation.)

  • By default, weight-stripped engines must be refitted with the original ONNX weights. Only engines built with the --refit flag in the trtexec arg list may be refitted with arbitrary weights.

  • Fully refittable engines might have some performance degradation.

  • Custom plugins or any custom ops are not supported. Only built-in TensorRT ops and plugins will work.

  • Input ONNX models must come from one of the following:

    • S3

    • GitHub

    • Local machine

  • Invalid trt-llm engine build configs may fail when setting --tp-size > 1 for too small GPUs.

For inquiries and to report issues, contact tensorrt-cloud-contact@nvidia.com.

TensorRT-Cloud 0.5.1 Early Access (EA)#

Key Features and Enhancements

The following features and enhancements have been added to this release:

  • Support for TensorRT versions 10.6 and 10.7 was added.

  • Support for TensorRT-LLM versions 0.14 and 0.15 was added.

  • Support for more Llama, Gemma, and Mistral variants was added.

  • Updated EULA for building TensorRT and TensorRT-LLM engines.

Breaking API Changes

  • Support for TensorRT-LLM version 0.11 was removed.

Limitations

  • Weight-stripped on-demand LLM engine building is only supported for checkpoint inputs.

  • Refit requires a GPU of the same SM version used to build the engine. (This is a TensorRT limitation.)

  • By default, weight-stripped engines must be refitted with the original ONNX weights. Only engines built with the --refit flag in the trtexec arg list may be refitted with arbitrary weights.

  • Fully refittable engines might have some performance degradation.

  • Custom plugins or any custom ops are not supported. Only built-in TensorRT ops and plugins will work.

  • Input ONNX models must come from one of the following:

    • S3

    • GitHub

    • Local machine

  • Invalid trt-llm engine build configs may fail when setting --tp-size > 1 for too small GPUs.

For inquiries and to report issues, contact tensorrt-cloud-contact@nvidia.com.

TensorRT-Cloud 0.5.0 Early Access (EA)#

Key Features and Enhancements

The following features and enhancements have been added to this release:

  • Added multi-version support for TensorRT-LLM.

    • Can now build using TensorRT-LLM version 0.11 or 0.12.

  • Added support for TensorRT versions 10.4 and 10.5.

Limitations

  • Return types with metrics are currently unsupported for checkpoint inputs with on-demand TensorRT-LLM builds.

  • Weight-stripped on-demand LLM engine building is only supported for checkpoint inputs.

  • Refit requires a GPU of the same SM version used to build the engine. (This is a TensorRT limitation.)

  • By default, weight-stripped engines must be refitted with the original ONNX weights. Only engines built with the --refit flag in the trtexec arg list may be refitted with arbitrary weights.

  • Fully refittable engines might have some performance degradation.

  • Custom plugins or any custom ops are not supported. Only built-in TensorRT ops and plugins will work.

  • Input ONNX models must come from one of the following:

    • S3

    • GitHub

    • Local machine

  • Invalid trt-llm engine build configs may fail when setting --tp-size > 1 for too small GPUs.

For inquiries and to report issues, contact tensorrt-cloud-contact@nvidia.com.