Release Notes#
TensorRT-Cloud CLI
Minor improvements will be continuously pushed without the expectation that you will need to upgrade.
Major changes can be expected at a monthly cadence with the expectation that you will upgrade your version of the CLI.
Until we release TensorRT-Cloud CLI 1.0, expect some API-breaking changes with new releases.
TensorRT-Cloud
Minor improvements will be continuously pushed to production to provide enhancements as soon as possible.
Major API-breaking changes will be announced clearly in the release notes. We expect to make API-breaking changes as we receive feedback from EA customers. We expect you to upgrade to a newer version of the CLI on an API-breaking change.
Backward compatibility will be considered for support for GA.
TensorRT-Cloud 0.6.1 Early Access (EA)#
Key Features and Enhancements
The following features and enhancements have been added to this release:
Added support for TensorRT-LLM version 0.18.2.
Breaking API Changes
Windows native support has been disabled. Contact NVIDIA if this is important for your use case.
Fixed Issues
The following issues have been fixed in this release:
When a heuristic sampler fails, we now use random sampling.
Limitations
Weight-stripped on-demand LLM engine building is only supported for TensorRT-LLM checkpoint inputs.
Refit requires a GPU of the same SM version used to build the engine. (This is a TensorRT limitation.)
By default, weight-stripped engines must be refitted with the original ONNX weights. Only engines built with the
--refit
flag in thetrtexec
argument list may be refitted with arbitrary weights.Fully refittable engines might have some performance degradation.
Custom plugins or any custom ops are not supported. Only built-in TensorRT ops and plugins will work.
Invalid
trt-llm
engine build configs may fail when setting--tp-size > 1
for GPUs that are too small.Large search space will take a while for evaluation and will potentially fail the generated sweep overview due to timeout.
Windows sweeping is not supported.
Known Issues
ONNX builds fail for Python 3.10 due to an OpenSSL version incompatibility if the OpenSSL version is too low. Run
pip install pyOpenSSL --upgrade
to workaround this issue.
For inquiries and to report issues, contact tensorrt-cloud-contact@nvidia.com.
TensorRT-Cloud 0.6.0 Early Access (EA)#
Announcements
TensorRT-Cloud is now a service that manages the state of your sweeps and engine builds, which allows for better management of sweep requests.
Key Features and Enhancements
The following features and enhancements have been added to this release:
Added support for hyper-parameter sweeping for TensorRT-LLM models. For more information, refer to the Sweeping for Optimized TensorRT-LLM Engines section.
DevZone based account creation. For more information, refer to the Requesting a TensorRT-Cloud Enabled NGC Org section.
Added support for credits to limit engine build and sweep usage. For more information, refer to the Usage Credits section.
Breaking API Changes
TensorRT-LLM build
request_id
support has been removed.
Limitations
Weight-stripped on-demand LLM engine building is only supported for TensorRT-LLM checkpoint inputs.
Refit requires a GPU of the same SM version used to build the engine. (This is a TensorRT limitation.)
By default, weight-stripped engines must be refitted with the original ONNX weights. Only engines built with the
--refit
flag in thetrtexec
argument list may be refitted with arbitrary weights.Fully refittable engines might have some performance degradation.
Custom plugins or any custom ops are not supported. Only built-in TensorRT ops and plugins will work.
Invalid
trt-llm
engine build configs may fail when setting--tp-size > 1
for GPUs that are too small.Large search space will take a while for evaluation and will potentially fail the generated sweep overview due to timeout.
Windows sweeping is not supported.
Known Issues
ONNX builds fail for Python 3.10 due to an OpenSSL version incompatibility if the OpenSSL version is too low. Run
pip install pyOpenSSL --upgrade
to workaround this issue.
For inquiries and to report issues, contact tensorrt-cloud-contact@nvidia.com.