1. JetPack 4.4 Developer Preview

1.1. New Features

This release adds support for the Jetson Xavier NX module, and includes new versions of CUDA, TensorRT, and cuDNN. Support for Vulkan 1.2 has been added as well.

In addition, DeepStream 5.0 Developer Preview is supported in this release.

2. Additional Release Details

2.1. OS

L4T 32.4.2

  • Vulkan 1.2 support

  • Support for upgrading L4T version using Debian package management tool1

  • Support for Generic Timestamping Engine (GTE) for Jetson AGX Xavier and Jetson Xavier NX

    • Kernel driver for GTE

    • Sample GPIO driver using GTE for timestamping input state transitions

    Refer to the GTE section in the NVIDIA Jetson Linux Developer Guide.

  • Support for Dynamic Frequency Scaling (DFS) for VIC using actmon.

  • Samples to demonstrate hardware backed authentication and encryption capabilities of Jetson TX2, Jetson AGX Xavier and Jetson Xavier NX.

    Refer to the Security section in the NVIDIA Jetson Linux Developer Guide.

  • Utility to burn fuse with multiple Jetson devices simultaneously.

  • For Jetson Nano and Jetson Xavier NX Developer Kits:

    • Option to select APP partition size on the microSD card during initial configuration at first boot.

1 Debian package-based L4T upgrade is supported only starting with L4T version 32.3.1.

2.2. Libraries and APIs

CUDA 10.2

  • Performance optimization through user-mode submits.

    50% launch latency reduction for CUDA kernels, resulting in improved GPU utilization and lower CPU utilization.

TensorRT 7.1.0 (Developer Preview)

  • New Layers

    1. IFillLayer: The IFillLayer is used to generate an output tensor with the specified mode.

    2. IIteratorLayer: The IIteratorLayer enables a loop to iterate over a tensor. A loop is defined by loop boundary layers.

    3. ILoopBoundaryLayer: Class ILoopBoundaryLayer defines a virtual method getLoop() that returns a pointer to the associated ILoop.

    4. ILoopOutputLayer: The ILoopOutputLayer specifies an output from the loop.

    5. IParametricReluLayer: The IParametricReluLayer represents a parametric ReLU operation, meaning, a leaky ReLU where the slopes for x < 0 can be different for each element.

    6. IRecurrenceLayer: The IRecurrenceLayer specifies a recurrent definition.

    7. ISelectLayer: The ISelectLayer returns either of the two inputs depending on the condition.

    8. ITripLimitLayer: The ITripLimitLayer specifies how many times the loop iterates.

  • New Operators

    Expanded support for ONNX operations: Added ConstantOfShape, DequantizeLinear, Equal, Erf, Expand, Greater, GRU, Less, Loop, LRN, LSTM, Not, PRelu, QuantizeLinear, RandomUniform, RandomUniformLike, Range, RNN, Scan, Sqrt, Tile, and Where.

  • New Samples

    sampleAlgorithmSelector shows an example of how to use the algorithm selection API based on sampleMNIST. This sample demonstrates the usage of IAlgorithmSelector to deterministically build TensorRT engines.
  • Working with Loops

    TensorRT supports loop-like constructs, which can be useful for recurrent networks. TensorRT loops support scanning over input tensors, recurrent definitions of tensors, and both "scan outputs" and "last value" outputs.

  • ONNX parser with dynamic shapes support

    The ONNX parser supports full-dimensions mode only. Your network definition must be created with the explicitBatch flag set.

  • BERT INT8 and mixed precision optimizations

    Some GEMM layers are now followed by GELU activation in the BERT model. Since TensorRT doesn't have IMMA GEMM layers, you can implement those GEMM layers in the BERT network with either IConvolutionLayer or IFullyConnectedLayer layers depending on what precision you require. For example, you can leverage IConvolutionLayer with H == W == 1 (CONV1x1) to implement a FullyConnected operation and leverage IMMA math under INT8 mode. TensorRT supports the fusion of Convolution/FullyConnected and GELU.

  • Working with Quantized Networks

    Supports quantized models trained with Quantization Aware Training. Support is limited to symmetrically quantized models, meaning zero_point = 0 using QuantizeLinear and DequantizeLinear.

  • Boolean Tensor Support

    TensorRT supports Boolean tensors which can be marked as network input and output. IElementWiseLayer, IUnaryLayer (only kNOT), IShuffleLayer, ITripLimit (only kWHILE) and ISelectLayer support the Boolean datatype. Boolean tensors can be used only with FP32 and FP16 precision networks.

  • Working with empty tensor

    TensorRT supports empty tensors. A tensor is an empty tensor if it has one or more dimensions with length zero.

  • Builder layer timing cache

    The layer timing cache will cache the layer profiling information during the builder phase. Models with repeated layer will see a significant speedup in builder time.

  • Pointwise fusion based on code generation

    Pointwise fusion is updated to use code generation and runtime compilation to further improve performance.

  • Dilation support for deconvolution

    IDeconvolutionLayer now supports a dilation parameter. This is accessible through the C++ API, Python API, and the ONNX parser.

  • Selecting FP16 and INT8 kernels

    TensorRT supports Mixed Precision Inference with FP32, FP16, or INT8 as supported precisions. Depending on the hardware support, you can choose to enable either of the above precision to accelerate inference.

  • Calibration with dynamic shapes

    INT8 calibration with dynamic shapes supports the same functionality as a standard INT8 calibrator but for networks with dynamic shapes.

  • Algorithm selection

    Algorithm selection provides a mechanism to select and report algorithms for different layers in a network. This can also be used to deterministically build TensorRT engine or to reproduce the same implementations for layers in the engine.

    Refer to the release notes for TensorRT 7.X.X for detailed release notes.

cuDNN 8.0.0 (Developer Preview)

  • cuDNN library split into multiple inferencing and training libraries, enabling applications to only link against needed cuDNN sub-components.

VPI 0.2.0 (Developer Preview)

  • Performance optimization of algorithms introduced in VPI 0.1.0: up to 45x on GPU and up to 90x on CPU backends.

    Refer to the VPI documentation for benchmarks.

  • New Image FFT, Image iFFT and Image Format converter algorithms added with support for CPU and GPU backends. (PVA backend will be supported in a future release.)

2.3. Developer Tools

  • NVIDIA Nsight Systems 2020.2 for application profiling across GPU and CPU.

    • Enhanced data analysis with option to export to SQLite, HDF5 or JSON

    • Support for sampling Xavier PMU extensions

    • Reduced NVTX overhead

    • New CLI support for profiling on devices with intermittent network connectivity

  • NVIDIA Nsight Graphics 2020.1 for graphics application debugging and profiling.

    • Added support to save and load custom named layouts

    • Improved events view display and filtering

    • Enhanced support for mixed DPI monitor scaling

    • Added precise control of pixel position in the resources view for launching pixel history

    • Improved Vulkan action profiling information by fixing GPU clocks when running profiling experiments

    • Adds support for new Vulkan extensions:

      VK_EXT_line_rasterization
      VK_EXT_headless_surface
      VK_KHR_create_renderpass2
      VK_KHR_external_fence_win32
      VK_KHR_external_memory_win32
      VK_KHR_external_semaphore
      VK_KHR_imageless_framebuffer
      VK_NVX_image_view_handle
  • NVIDIA Nsight Compute 2019.3 for CUDA kernel profiling.

    • NVIDIA Nsight Compute version remains the same from JetPack 4.3

Previous | Next

  Previous Topic     Next Topic  

Introduction to JetPack    

How to Install JetPack    

 

Notices

Notice

THE INFORMATION IN THIS GUIDE AND ALL OTHER INFORMATION CONTAINED IN NVIDIA DOCUMENTATION REFERENCED IN THIS GUIDE IS PROVIDED “AS IS.” NVIDIA MAKES NO WARRANTIES, EXPRESSED, IMPLIED, STATUTORY, OR OTHERWISE WITH RESPECT TO THE INFORMATION FOR THE PRODUCT, AND EXPRESSLY DISCLAIMS ALL IMPLIED WARRANTIES OF NONINFRINGEMENT, MERCHANTABILITY, AND FITNESS FOR A PARTICULAR PURPOSE. Notwithstanding any damages that customer might incur for any reason whatsoever, NVIDIA’s aggregate and cumulative liability towards customer for the product described in this guide shall be limited in accordance with the NVIDIA terms and conditions of sale for the product.

THE NVIDIA PRODUCT DESCRIBED IN THIS GUIDE IS NOT FAULT TOLERANT AND IS NOT DESIGNED, MANUFACTURED OR INTENDED FOR USE IN CONNECTION WITH THE DESIGN, CONSTRUCTION, MAINTENANCE, AND/OR OPERATION OF ANY SYSTEM WHERE THE USE OR A FAILURE OF SUCH SYSTEM COULD RESULT IN A SITUATION THAT THREATENS THE SAFETY OF HUMAN LIFE OR SEVERE PHYSICAL HARM OR PROPERTY DAMAGE (INCLUDING, FOR EXAMPLE, USE IN CONNECTION WITH ANY NUCLEAR, AVIONICS, LIFE SUPPORT OR OTHER LIFE CRITICAL APPLICATION). NVIDIA EXPRESSLY DISCLAIMS ANY EXPRESS OR IMPLIED WARRANTY OF FITNESS FOR SUCH HIGH RISK USES. NVIDIA SHALL NOT BE LIABLE TO CUSTOMER OR ANY THIRD PARTY, IN WHOLE OR IN PART, FOR ANY CLAIMS OR DAMAGES ARISING FROM SUCH HIGH RISK USES.

NVIDIA makes no representation or warranty that the product described in this guide will be suitable for any specified use without further testing or modification. Testing of all parameters of each product is not necessarily performed by NVIDIA. It is customer’s sole responsibility to ensure the product is suitable and fit for the application planned by customer and to do the necessary testing for the application in order to avoid a default of the application or the product. Weaknesses in customer’s product designs may affect the quality and reliability of the NVIDIA product and may result in additional or different conditions and/or requirements beyond those contained in this guide. NVIDIA does not accept any liability related to any default, damage, costs or problem which may be based on or attributable to: (i) the use of the NVIDIA product in any manner that is contrary to this guide, or (ii) customer product designs.

Other than the right for customer to use the information in this guide with the product, no other license, either expressed or implied, is hereby granted by NVIDIA under this guide. Reproduction of information in this guide is permissible only if reproduction is approved by NVIDIA in writing, is reproduced without alteration, and is accompanied by all associated conditions, limitations, and notices.

Trademarks

NVIDIA, the NVIDIA logo, and cuBLAS, CUDA, CUDA-GDB, CUDA-MEMCHECK, cuDNN, cuFFT, cuSPARSE, DIGITS, DGX, DGX-1, DGX Station, NVIDIA DRIVE, NVIDIA DRIVE AGX, NVIDIA DRIVE Software, NVIDIA DRIVE OS, NVIDIA Developer Zone (aka "DevZone"), GRID, Jetson, NVIDIA Jetson Nano, NVIDIA Jetson AGX Xavier, NVIDIA Jetson TX2, NVIDIA Jetson TX2i, NVIDIA Jetson TX1, NVIDIA Jetson TK1, Kepler, NGX, NVIDIA GPU Cloud, Maxwell, Multimedia API, NCCL, NVIDIA Nsight Compute, NVIDIA Nsight Eclipse Edition, NVIDIA Nsight Graphics, NVIDIA Nsight Systems, NVLink, nvprof, Pascal, NVIDIA SDK Manager, Tegra, TensorRT, Tesla, Visual Profiler, VisionWorks and Volta are trademarks and/or registered trademarks of NVIDIA Corporation in the United States and other countries. Other company and product names may be trademarks of the respective companies with which they are associated.