Release Notes

TensorRT CLI

Minor improvements will be continuously pushed without expectation that you will need to upgrade.

Major changes can be expected at a monthly cadence with expectation that you will upgrade your version of the CLI.

Until we release TensorRT-Cloud CLI 1.0, expect some API breaking changes with new releases.

TensorRT-Cloud

Minor improvements will be continuously pushed to production to provide enhancements as soon as possible.

Major API breaking changes will be announced clearly with release notes. We expect to make some API breaking changes as we receive feedback from EA customers. On an API breaking change, we expect you to upgrade to a newer version of the CLI.

Backward compatibility will be considered for support for GA.

TensorRT-Cloud 0.2.0 Early Access (EA)

Announcements

  • The TensorRT-Cloud CLI tool is now available on PyPi.

Key Features and Enhancements

The following features and enhancements have been added to this release:

  • Added support for access to pre-built engines through TensorRT-Cloud.

  • Added support for more NVIDIA GeForce GPUs. For more information, refer to Planned GPU Support.

Breaking API Changes

  • CLI flags:

    • trt-cloud build --weightless was renamed to --strip-weights

    • trt-cloud build --strip-weights (formerly --weightless) no longer performs refit automatically. It is now an opt-in option with --local-refit.

Limitations

  • Input model files have a maximum file size of 5 GB.

    • This will be fixed in future releases. For now, models larger than 5 GB should use the weightless flow. Refer to the Weight-Stripped Engine Generation section for information on weightless engine building.

  • Refit requires a GPU of the same SM version as was used to build the engine. (This is a TensorRT limitation.)

  • By default, weight-stripped engines must be refitted with the original ONNX weights. Only engines that were built with the --refit flag in the trtexec arg list may be refitted with arbitrary weights.

  • Fully refittable engines might have some performance degradation.

  • Custom plugins or any custom ops are not supported. Only built-in TensorRT ops and plugins will work.

  • Input ONNX models must come from one of the following:

    • S3

    • GitHub

    • Local machine

  • The TensorRT-Cloud server has a daily limit on the amount of data it can process for building engines on Windows. If TensorRT-Cloud hits this limit on a given day, then building on Windows will not be available for the rest of the day.

Known Issues

For inquiries and to report issues, contact tensorrt-cloud-contact@nvidia.com.

TensorRT-Cloud 0.1.1 Early Access (EA)

Announcements

  • The TensorRT-Cloud CLI tool will be published to PyPi in the near future.

Key Features and Enhancements

The following features and enhancements have been added to this release:

  • Added support for on-demand ONNX TensorRT engines for closed EA accounts.

  • Added support for a vast variety of NVIDIA GeForce GPUs available to build TensorRT engines. For more information, refer to Planned GPU Support.

Limitations

  • Input model files have a maximum file size of 5 GB.

    • This will be fixed in future releases. For now, models larger than 5 GB should use the weightless flow. Refer to the Weight-Stripped Engine Generation section for information on weightless engine building.

  • Refit requires a GPU of the same SM version as was used to build the engine. (This is a TensorRT limitation.)

  • By default, weightless engines must be refitted with the original ONNX weights. Only engines that were built with the --refit flag in the trtexec arg list may be refitted with arbitrary weights.

  • Fully refittable engines might have some performance degradation.

  • Custom plugins or any custom ops are not supported. Only built-in TensorRT ops and plugins will work.

  • Input ONNX models must come from one of the following:

    • S3

    • GitHub

    • Local machine

  • The TensorRT-Cloud server has a daily limit on the amount of data it can process for building engines on Windows. If TensorRT-Cloud hits this limit on a given day, then building on Windows will not be available for the rest of the day.

Known Issues

For inquiries and to report issues, contact tensorrt-cloud-contact@nvidia.com.