TensorFlow For Jetson Platform

This document describes the key features, software enhancements and improvements, and known issues regarding Tensorflow 1.15.3 and 2.3.0 on the Jetson platform.

Key Features and Enhancements

This release includes the following key features and enhancements.
  • As of the 20.02 release, the TensorFlow package has been renamed from tensorflow-gpu to tensorflow. This change will only affect the package's installation and its name within the python package manager, not how TensorFlow is imported or accessed once installed on the system. In order to properly upgrade your version of TensorFlow, you may need to uninstall the tensorflow-gpu package before installing tensorflow, as having both installed simultaneously will lead to undefined behavior.

  • TensorFlow 2.0 is now available for installation. See the TensorFlow’s Effective TensorFlow 2 guide for details about the update. Note that TensorFlow 2.x is not fully compatible with TensorFlow 1.x releases, therefore, code written for the older framework may not work with the newer package.

  • Bug fixes and improvements for TF-TRT. For more information, see the TensorFlow-TensorRT (TF-TRT) User Guide and the TensorFlow Container Release Notes.

Compatibility

Table 1. TensorFlow compatibility with NVIDIA containers and Jetpack
TensorFlow Version NVIDIA TensorFlow Container JetPack Version
2.3.0 20.09 4.4
2.2.0 20.08, 20.07, 20.06 4.4
2.1.0 20.04 4.4
20.03, 20.02 4.3
1.15.3 20.09, 20.08, 20.07 4.4
1.15.2 20.06, 20.04 4.4
20.03, 20.02 4.3
Older packages below are installed as tensorflow-gpu; more recent releases above as tensorflow.
2.0.0 20.01, 19.12 4.3
19.11 4.2.x
1.15.0 20.01, 19.12 4.3
19.11 4.2.x
1.14.0 19.10, 19.09, 19.07 4.2.x
1.14.0 19.09 3.3.1
1.13.1 19.05, 19.04, 19.03 4.2.x
Python Package Dependencies: The current TensorFlow package was built in a Python 3 environment with the following pip packages installed at the specified versions:
numpy==1.16.1, future==0.17.1, mock==3.0.5, h5py==2.9.0, gast==0.2.2, keras_preprocessing==1.0.5, keras_applications==1.0.8, scipy==1.4.1

Using TensorFlow With The Jetson Platform

Memory

If you observe any out-of-memory problems in TensorFlow, you can use a custom configuration to limit the amount of memory TensorFlow tries to allocate. This can be accomplished by allowing the GPU memory allocation to grow, or setting a hard limit on the amount of memory the allocator will attempt to use. Depending on which version of the framework you are using, please see either the TensorFlow 2 guide or the archived TensorFlow 1.x documentation for details.

Storage

If you need more storage, we recommend connecting an external SSD via SATA on TX2 or Xavier devices, or USB on Jetson Nano.

Operators

In TensorFlow 1.x, if you want to see which operators of your graph are placed on your module, use tf.ConfigProto(log_device_placement=True) to see all the device placements.

Known Issues

  • A previous version of the TensorFlow 2.1.0 package corresponding to the 20.04 release was not compiled for the correct version of TensorFlow, and has been replaced by a fixed version. The problematic package can by identified by running tensorflow.__version__ after importing TensorFlow. If no error is thrown, the version installed is correct; otherwise, it is recommended to upgrade to the fixed version of a newer TensorFlow release altogether.
  • In TensorFlow 1.14, when using NumPy 1.17 there is a known issue where importing TensorFlow will print out a number of FutureWarning messages. This does not affect performance and can be avoided by installing NumPy version 1.16 instead of 1.17.
  • Certain tasks with large memory requirements may fail due to the memory limitations of embedded systems. Creating a swap partition will effectively enable your device to make use of more memory at the cost of speed and storage space, and may solve the problem.
  • When accelerating the inference in TensorFlow with TensorRT (TF-TRT), you may experience problems with tf.estimator and standard allocator (BFC allocator). It may cause crashes and nondeterministic accuracy. If you run into issues, use cuda_malloc as an allocator (type exportTF_GPU_ALLOCATOR=”cuda_malloc”).
  • Convolutions with a large kernel size using native TensorFlow are slow. It’s because the SwapDimension0And2InTensor3Simple kernel has no implementation using tiling. This issue will be resolved in a future release. Use TF-TRT to speed up the process.
  • The TF-TRT conversion sometimes crashes without any error message. It usually happens when the OS runs out of swap memory. In this case, it can kill processes to reclaim space. If you encounter this issue, add more swap memory.
  • Some NVIDIA tools, for example, nvidia-smi, are not supported on the Jetson platform. For more information, see the JetPack documentation.
  • Jetson AGX Xavier has 16GB of memory which is shared between the CPU and GPU. As a result, some applications that work on a GPU with 16GB of memory may not work on Jetson AGX Xavier because not all of the 16GB is available to the GPU.
  • The following TensorFlow unit tests will fail on Jetson AGX Xavier. Some of the TensorFlow failures are due to bugs in the test scripts; which affect only the AArch64 platform.
    • //tensorflow/core/kernels:sparse_matmul_op_test_gpu
    • //tensorflow/core/kernels:requantize_op_test
    • //tensorflow/core/kernels:quantized_bias_add_op_test
    • //tensorflow/core:util_tensor_slice_set_test
    • //tensorflow/core:platform_stacktrace_handler_test
    • //tensorflow/core:platform_profile_utils_cpu_utils_test
    • //tensorflow/core:common_runtime_gpu_gpu_device_test
    • //tensorflow/core:gpu_device_unified_memory_test_gpu
    • //tensorflow/core:common_runtime_direct_session_with_debug_test
    • //tensorflow/core:common_runtime_direct_session_with_debug_test
    • //tensorflow/core:device_tracer_test
    • //tensorflow/core:framework_unique_tensor_references_test
    • //tensorflow/python:basic_session_run_hooks_test