Machine Learning Workflow

Machine Learning (ML) Workflow is the collection of code and samples intended to speed up adoption of ML with the Isaac SDK. These samples use Tensorflow framework for training, but the same principles and code should also work with other ML frameworks like PyTorch.

Training data is hard to collect and harder to label. Incorporating synthesized data from simulation can accelerate the process. The Free Space Segmentation section describes training sample ML models from synthesized data with Unity3D.

Three possible runtimes are provided in the Isaac SDK to perform inference with trained ML models on both Jetson TX2/Xavier and PC. All samples can be compiled and run on both platforms.

Inference with Tensorflow

Tensorflow is a popular ML framework from Google which is used for training in the samples presented here. The Isaac SDK also works with the Tensorflow runtime to perform inference with the trained model as-is.

The Stereo Depth DNN section presents sample applications for performing inference with Tensorflow.

For more information about working with Tensorflow in Isaac, please refer to TensorflowInference Codelet, its API reference and Tensorflow Developer Guide.


Note that Tensorflow requires non-trivial amount of resources that may result in system strain and take some time to load on edge devices with limited resources.

Inference with TensorRT

TensorRT is a deep learning inference optimization tool and runtime from NVIDIA. It is designed to deliver low latency and high throughput. It supports models from all major frameworks, including Tensorflow, Caffe 2, Chainer, Microsoft Cognitive Toolkit, MxNet and PyTorch.

For more information about working with TensorRT in Isaac, please refer to TensorRT Inference Codelet, its API reference and TensorRT Developer Guide.

Inference with Torch

Torch is a scientific computing framework with wide support for deep learning algorithms. Torch is easy to use and efficient, thanks to an easy and fast scripting language, Lua, and an underlying C/CUDA implementation.

For more information about working with Torch, please refer to the API reference and Torch Documentation.

Developing, training and tuning deep learning models requires massive amounts of data and computational power. Such heavy lifting tasks are expected to be performed by storage and GPUs in the cloud or in computing clusters, while the real robot applications run on edge devices with limited computational power.

A smooth workflow from data collection all the way to deep learning model deployment into robots accelerates robot application development.


Many ML developers are fluent in Python. The PyCodelet facilitates data transfer to and from the Isaac SDK in Python.



Instead of coding Codelet in C++, ML developers could code in Python with PyCodelet. To start with, one need to declare the Python-based codelet as sub-class of alice.Codelet.


class MyPyCodeletProducer(Codelet):

Just like C++ based Codelet, developer need to override 3 member functions: start(), tick() and stop(). tick would be invoked periodically upon message arrival or timeout, while start() and stop() would be invoked when the Codelet enters or exits running status.

Though Python Codelet shares similar concept with C++ Codelet, there are some minor differences:

  • Python Codelet needs to retrieve hook for messages explicitly via isaac_proto_tx() and isaac_proto_rx().

  • Python Codelet needs to retrieve message via get_proto() and create message via init_proto() explicitly from hook.

  • To facilitate Python Codelet in application, the node for Python Codelet is created with following JSON specification via alice.loadGraphFromFile().


{ "name": "foo_node", "components": [ { "name": "ml", "type": "isaac::alice::MessageLedger" }, { "name": "isaac.alice.PyCodelet", "type": "isaac::alice::PyCodelet" } ] }

alice.register_pycodelets() is invoked explicitly later to bind Python Codelet to these nodes using mapping like the one presented below.


{ "producer": MyPyCodeletProducer, "consumer": MyPyCodeletConsumer }

Python codelets have access to parameters via JSON. Check python_ping for sample.

TensorflowInference Codelet

The TensorflowInference codelet takes a trained Tensorflow frozen model and runs inference in the Isaac SDK application. The input and output messages are both TensorList which is a list of TensorProto messages. The codelet takes several parameters as shown by the configuration for the ball segmentation inference application:


"model_file_path": "/tmp/ball_segmentation/ckpts/model-0-frozen.pb", "config_file_path": "", "input_tensor_info": [ { "ops_name": "input", "index": 0, "dims": [1, 256, 512, 3] } ], "output_tensor_info": [ { "ops_name": "output", "index": 0, "dims": [1, 256, 512, 1] } ]

  • model_file_path - Points to the Tensorflow frozen model to run in the application. For more information about the frozen model, refer to TensorFlow NVIDIA GPU-Accelerated container and Tensorflow tool.

  • config_file_path - Points to a protobuf file that contains a Tensorflow ConfigProto object for configuring Tensorflow runtime. Use the following command as a starting point for customization.


python -c "import tensorflow as tf; f=open('config_proto.pb', 'w');f.write(tf.ConfigProto(allow_soft_placement=True, gpu_options=tf.GPUOptions(per_process_gpu_memory_fraction=0.5)).SerializeToString()); f.close()"

  • input_tensor_info - List of input tensor specifications. Each specification includes the operation name (as in NodeDef), the index of an input within the operation, and tensor dimensions.

    For example, the following code specifies three input tensors:


    "input_tensor_info": [ { "ops_name": "input_color", "dims": [1, 3, 256, 512], "index": 0 }, { "ops_name": "input_color", "dims": [1, 3, 256, 512] "index": 1 }, { "ops_name": "input_depth", "dims": [1, 1, 256, 512] "index": 0 } ]

  • output_tensor_info - List of output tensor specifications. Each specification includes the operation name (as in NodeDef), the index of an output within the operation, and tensor dimensions.

    For example, the following code specifies output tensors of an operation:


    "output_tensor_info": [ { "ops_name": "Predictions/Reshape", "dims": [1, 1001], "index": 0 } ]

TensorRT Inference Codelet

The TensorRT Inference codelet takes a TensorRT .plan, .uff, or .onnx model and runs inference in the Isaac SDK application on the GPU. The input and output messages are both TensorList, which is a list of TensorProto messages.

TensorRT Inference takes following parameters:

  • model_file_path. - A path to a device-agnostic model in UFF or ONNX format. The UFF model can be created by the UFF Converter tool, or converted from Tensorflow GraphDef, Frozen Protobuf Model, or a Keras Model. Sample code and Bazel build step that utilizes this tool is available at packages/ml/tools. The ONNX format is an open format and is supported by Caffe2, MXNet, and PyTorch, with export to ONNX available for all major frameworks. See also: ONNX tutorials.


    The model file is ignored if the device-specific serialized engine file is found at a default or specified location (see below).

  • model_file_path (optional) - A path to a device-specific serialized TensorRT engine. The engine can be created automatically by the Isaac TensorRT codelet or from the model upon first start on the target device. This conversion process and optimization for the target device may take from several seconds to to several minutes, depending on the size of the model and the system’s performance. Alternatively, the engine can be pre-cached on the device.

    If this parameter is not specified, the default is set to Model File Path, with the extension replaced to .plan.


    The engine file, if it exists, takes precedence over the Model File.


    The engine is not portable between different devices, different versions of TensorRT, or different versions of the CuDNN library.

  • input_tensor_info - List of input tensor specifications. Each specification includes the operation name (as in NodeDef), tensor dimensions, and optional parameters.

    To give an example, the following code specifies two input tensors:


    "input_tensor_info": [ { "operation_name": "input_1", "dims": [1, 3, 256, 512] }, { "operation_name": "input_2", "dims": [1, 3, 256, 512] } ]

    Input Tensor Specification Parameters

    • operation_name - The name of the operation to look up in the model graph.

    • dims - The tensor dimensions. Note: In the TensorRT documentation, individual tensor dimensions may be referred by the following names:


      "dims": [Batch Size, Channel, Rows, Columns]

      • Batch Size (Optional, Default = 1) - Number of samples in a batch.


        To specify a variable batch size, set this parameter to -1. The maximum batch size parameter (see below) must also be set.

      • Channels - Number of image channels or components of a vector.

      • Rows (Optional) - Number of rows of a matrix (or height of an image).

      • Columns (Optional) - Number of columns of a matrix (or width of an image).

      As an example, assume a TensorFlow/Keras model is trained on 320x200x3 (RGB) images. The correct setting for such a model is as follows:


      "dims": [1, 3, 200, 320]


      While the Input Order (or weights memory layout) during model training could use either of the following formats, this codelet only supports the channels_first format for the input tensor at inference time, regardless of the original framework input order:

      • channels_last: With the Channel as the last index of the tensor. This is referred to by the TensorRT documentation as ‘NHWC’ layout (Batch Number, Height, Width, Channel). For example, [1, 200, 320, 3].

      • channels_first: With the Channel as the first (or second) index of the tensor. This is referred to by the TensorRT documentation as ‘NCHW’ layout (Batch Number, Channel, Height, Width). For example, [1, 3, 200, 320].


      The ‘Batch Size’ tensor dimension is set to 1 and can be omitted in that example.

This list of Input Tensor Specifications is used by the Model Parser to cutout

a part of the graph used for inference, to set the dimensions of variable-size inputs and to perform memory allocations. It is also used at inference time, to validate the rank and size of the input tensor.

The Input Tensor Specification should comply with:
  • Output of the previous node in the Isaac graph. Tensor ranks and dimensions of the Input Tensor Specification should match that of the output of the previous node. The Batch Size of the output of the previous node should be smaller or equal to the maximum batch size of the engine.

    If the Batch Size is not set, the previous node output should omit it also:

  • The specification of the Trained Model being parsed. The model should contain matching nodes with the sizes either fixed to match Input Tensor Specification or variable sizes (-1).

  • Limitations from TensorRT. TensorRT currently only supports following input memory layouts:

    • (Batch Size, Channel, Rows, Columns), for example [1, 3, 480, 640];

    • (Channel, Rows, Columns), for example [3, 480, 640];

    • (Batch Size, Channel), for example [1, 1024];

    • (Channel), for example [1024].

  • output_tensor_info - list of output tensor specifications. Each specification includes operation name (as in NodeDef) and tensor dimensions.

    For example, following code specifies one output tensor:


    "output_tensor_info": [ { "operation_name": "output", "dims": [1, 1001] } ]

    See also input tensor specifications above.

  • max_batch_size (optional) - The batch size for which the engine will be tuned. At execution time, smaller batches may be used, but not larger. The default is set to ‘-1’ which specifies that the input tensor size will be used to infer the maximum batch size. Note: this parameter affects the amounts of GPU memory allocations and engine performance.

    The input and output tensor specifications should have the same batch size. This batch size should be smaller or equal than the Maximum Batch Size of the model.

    If the batch size is equal to 1, this dimension can be retracted, for example for:


    "dims": [1, 256, 512, 1]

    the batch size is equal to 1 and the first dimension could be retracted:


    "dims": [256, 512, 1]

    This allows to avoid a TensorReshape operation.

    If the maximum batch size is set, this dimension can also be set to -1, for example:


    "dims": [-1, 256, 512, 1]

    in that case the dimension will be set at a runtime. This enables variable batch size support.


    The maximum batch size is used at the engine optimization step and for optimal performance it is recommended to set it to the actual value used at the inference time.

  • max_workspace_size (optional) - The temporary GPU memory size for which the engine will be tuned. Layer algorithms often require temporary workspace. This parameter limits the maximum size that any layer in the network can use. If insufficient scratch is provided, it is possible that TensorRT may not be able to find an implementation for a given layer. .. note:: This parameter affects the amounts of GPU memory allocated and engine performance.

  • inference_mode (optional) - Set whether or not 8-bit and 16-bit kernels are permitted. - Float16 (default) - during engine build fp16 kernels will be tried, when this mode is enabled. - Float32 - during engine build only fp32 kernels are allowed.

  • device_type (optional) - Set default device that this layer/network will execute on, GPU or DLA. - GPU (default) - during engine build GPU will be set as a default device. - DLA - during engine build DLA engine will be used as a default device.

  • allow_gpu_fallback (optional) - Allow fallback to GPU, if this layer/network can’t be executed on DLA.

  • force_engine_update (optional) - Force update of the CUDA engine, even if input or cached .plan file is present. Debug feature.

  • plugins_lib_namespace (optional) - Initialize and register all the existing TensorRT plugins to the Plugin Registry with an optional namespace. To enable plugins, set the plugins_lib_namespace parameter. An empty string is a valid value for this parameter and it specifies the default TensorRT namespace:


    "plugins_lib_namespace": "",


    The function that enables access to the Plugin Registry (initLibNvInferPlugins) should only be called once. To prevent calling this function from multiple instances of TensorRT Inference codelet, only include the Plugins Namespace parameter for a single codelet instance.

  • verbose (optional) - Enables verbose log output. This option enables logging of DNN optimization progress. It is disabled by default to increase log file usability at the default setting. Debug feature.

Example for the ball segmentation inference application configuration:


"model_file_path": "external/ball_segmentation_model/model-9000-trimmed.uff", "engine_file_path": "external/ball_segmentation_model/model-9000-trimmed.plan", "input_tensor_info": [ { "operation_name": "input", "dims": [1, 3, 256, 512] } ], "output_tensor_info": [ { "operation_name": "output", "dims": [1, 256, 512, 1] }

SampleAccumulator Codelet

The SampleAccumulator codelet is a component designed to buffer synthesized data from the simulator. Using Python binding, SampleAccumulator can serve as a tensorflow dataset for training ML models.

SampleAccumulator takes one parameter, the maximum samples to hold in the buffer.


"sample_buffer_size": 500

SampleAccumulatorViewer Codelet

The SampleAccumulatorViewer codelet visualizes simulation data queued in a SampleAccumulator instance. It searches the parent node for the SampleAccumulator instance and visualizes its queue buffer.

SampleAccumulatorViewer takes following parameters:

  • Grid Size: An array of 2 positive integers that specifies how many images would be stacked across height and width.

  • Mosaic Size: An array of 2 positive integers that specifies the height and width in pixels of generated visualization image.

  • Tick Period: The visualization update frequency.


"mosaic_samples": { "isaac.viewers.SampleAccumulatorViewer": { "grid_size": [8, 8], "mosaic_size": [1080, 1920], "tick_period": "100ms" }, "": { "sample_buffer_size": 64 } },

In the Isaac SDK, Tensor data is stored and passed as messages of TensorProto, which is the counterpart of the numpy ndarray data used in Tensorflow. Conversion is needed for ML to accommodate other data formats like image. Refer to for an example.

© Copyright 2018-2020, NVIDIA Corporation. Last updated on Oct 31, 2023.