Release Notes#

TensorRT-Cloud CLI

Minor improvements will be continuously pushed without the expectation that you will need to upgrade.

Major changes can be expected at a monthly cadence with the expectation that you will upgrade your version of the CLI.

Until we release TensorRT-Cloud CLI 1.0, expect some API-breaking changes with new releases.

TensorRT-Cloud

Minor improvements will be continuously pushed to production to provide enhancements as soon as possible.

Major API-breaking changes will be announced clearly in the release notes. We expect to make API-breaking changes as we receive feedback from EA customers. We expect you to upgrade to a newer version of the CLI on an API-breaking change.

Backward compatibility will be considered for support for GA.

TensorRT-Cloud 0.6.1 Early Access (EA)#

Key Features and Enhancements

The following features and enhancements have been added to this release:

  • Added support for TensorRT-LLM version 0.18.2.

Breaking API Changes

  • Windows native support has been disabled. Contact NVIDIA if this is important for your use case.

Fixed Issues

The following issues have been fixed in this release:

  • When a heuristic sampler fails, we now use random sampling.

Limitations

  • Weight-stripped on-demand LLM engine building is only supported for TensorRT-LLM checkpoint inputs.

  • Refit requires a GPU of the same SM version used to build the engine. (This is a TensorRT limitation.)

  • By default, weight-stripped engines must be refitted with the original ONNX weights. Only engines built with the --refit flag in the trtexec argument list may be refitted with arbitrary weights.

  • Fully refittable engines might have some performance degradation.

  • Custom plugins or any custom ops are not supported. Only built-in TensorRT ops and plugins will work.

  • Invalid trt-llm engine build configs may fail when setting --tp-size > 1 for GPUs that are too small.

  • Large search space will take a while for evaluation and will potentially fail the generated sweep overview due to timeout.

  • Windows sweeping is not supported.

Known Issues

  • ONNX builds fail for Python 3.10 due to an OpenSSL version incompatibility if the OpenSSL version is too low. Run pip install pyOpenSSL --upgrade to workaround this issue.

For inquiries and to report issues, contact tensorrt-cloud-contact@nvidia.com.

TensorRT-Cloud 0.6.0 Early Access (EA)#

Announcements

  • TensorRT-Cloud is now a service that manages the state of your sweeps and engine builds, which allows for better management of sweep requests.

Key Features and Enhancements

The following features and enhancements have been added to this release:

Breaking API Changes

  • TensorRT-LLM build request_id support has been removed.

Limitations

  • Weight-stripped on-demand LLM engine building is only supported for TensorRT-LLM checkpoint inputs.

  • Refit requires a GPU of the same SM version used to build the engine. (This is a TensorRT limitation.)

  • By default, weight-stripped engines must be refitted with the original ONNX weights. Only engines built with the --refit flag in the trtexec argument list may be refitted with arbitrary weights.

  • Fully refittable engines might have some performance degradation.

  • Custom plugins or any custom ops are not supported. Only built-in TensorRT ops and plugins will work.

  • Invalid trt-llm engine build configs may fail when setting --tp-size > 1 for GPUs that are too small.

  • Large search space will take a while for evaluation and will potentially fail the generated sweep overview due to timeout.

  • Windows sweeping is not supported.

Known Issues

  • ONNX builds fail for Python 3.10 due to an OpenSSL version incompatibility if the OpenSSL version is too low. Run pip install pyOpenSSL --upgrade to workaround this issue.

For inquiries and to report issues, contact tensorrt-cloud-contact@nvidia.com.