Machine Learning Workflow

Machine Learning (ML) Workflow is the collection of code and samples intended to speed up adoption of ML with the Isaac SDK. These samples use Tensorflow framework for training, but the same principles and code should also work with other ML frameworks like PyTorch.

Training with Simulation

Training data is hard to collect and harder to label. Incorporating synthesized data from simulation can accelerate the process. Free Space Segmentation and Ball Segmentation describes training sample ML models from synthesized data with Unity3D and Unreal Engine 4.

Inference on PC and Edge Devices

Three possible runtimes are provided in the Isaac SDK to perform inference with trained ML models on both Jetson TX2/Xavier and PC. All samples can be compiled and run on both platforms.

Inference with Tensorflow

Tensorflow is a popular ML framework from Google which is used for training in the samples presented here. The Isaac SDK also works with the Tensorflow runtime to perform inference with the trained model as-is.

Ball Segmentation and Stereo Depth DNN present sample applications performing inference with Tensorflow.

For more information about working with Tensorflow in Isaac, please refer to TensorflowInference Codelet, its API reference isaac.ml.TensorflowInference and Tensorflow Developer Guide.

Note

Note that Tensorflow requires non-trivial amount of resources that may result in system strain and take some time to load on edge devices with limited resources.

Inference with TensorRT

TensorRT is a deep learning inference optimization tool and runtime from NVIDIA. It is designed to deliver low latency and high throughput. It supports models from all major frameworks, including Tensorflow, Caffe 2, Chainer, Microsoft Cognitive Toolkit, MxNet and PyTorch.

Ball Segmentation presents sample application performing inference with TensorRT.

For more information about working with TensorRT in Isaac, please refer to TensorRT Inference Codelet, its API reference isaac.ml.TensorflowInference and TensorRT Developer Guide.

Inference with Torch

Torch Torch is a scientific computing framework with wide support for deep learning algorithms. Torch is easy to use and efficient, thanks to an easy and fast scripting language, Lua, and an underlying C/CUDA implementation.

For more information about working with Torch, please refer to Torch inference API reference isaac.ml.TorchInference and Torch Documentation.

Samples

Developing, training and tuning deep learning models requires massive amounts of data and computational power. Such heavy lifting tasks are expected to be performed by storage and GPUs in the cloud or in computing clusters, while the real robot applications run on edge devices with limited computational power.

A smooth workflow from data collection all the way to deep learning model deployment into robots accelerates robot application development.

Ball Segmentation

Ball Segmentation is a sample demonstrating Machine Learning workflow in Isaac SDK. It includes training a ML model with simulation and deploying the model directly to either a PC or Jetson TX2/Xavier/Nano device.

Note

On Jetson Nano the Ball segmentation inference application is supported only with TensorRT inference because of memory limitations.

../../_images/ball_seg_train.png

Fig. 1 Ball Segmentation Training Pipeline

As shown in Ball Segmentation Training Pipeline, the ML model needed for ball segmentation is trained using simulated data.

To train the model locally with simulation

  1. Launch Isaac Sim with the following command and add the absolute path for the JSON file.

    ./Engine/Binaries/Linux/UE4Editor IsaacSimProject CarterWarehouse_P -vulkan -isaac_sim_config_json="<IsaacSDK>/apps/samples/ball_segmentation/bridge_config/training_sim.json"
    
  2. In Isaac SDK, open the JSON file used above in editor and add the absolute paths for the following graph and config files.

    <IsaacSDK>/apps/samples/ball_segmentation/bridge_config/training_sim.graph.json
    <IsaacSDK>/apps/samples/ball_segmentation/bridge_config/training_sim.config.json
    
  3. Start the training pipeline with the following command.

    bazel run apps/samples/ball_segmentation/training
    
  1. (Optional) To monitor training process, start Tensorboard with the following command:

    CUDA_VISIBLE_DEVICES=-1 tensorboard --logdir=/tmp/ball_segmentation/logs
    

    Or open http://localhost:3000 in a web browser to check the possible output.

    ../../_images/ball_seg_train_sight.png

    Fig. 2 Ball Segmentation Training App output on Sight.

  1. (Optional) To convert the trained model to UFF (.uff) format supported by TensorRT codelet, run the following command (tensorrt, uff-converter-tf and a ‘model-9000-frozen.pb’ are required as input):

    cd /tmp/ball_segmentation/
    cp ckpts/model-9000-frozen.pb ./
    cp <IsaacSDK>/apps/samples/ball_segmentation/ball_validation_dataset/images/4724.jpg ./
    python3 <IsaacSDK>/apps/samples/ball_segmentation/trained_to_tensorrt.py
    

    This should produce ./model-9000-trimmed.uff in the /tmp/ball_segmentation/ folder.

    Note

    The trained_to_tensorrt.py script also generates TensorRT Engine (.plan) and performs a spot check on outputs of Tensorflow and TensorRT (recommended). The TensorRT Engine file is device and library version specific. The engine is not portable between different devices, different versions of TensorRT or different versions of the CuDNN library.

The trained model is stored in the following path by default.

/tmp/ball_segmentation/logs

To use a different path, attach parameter of -- --train_logdir in step 5. For more parameters, please refer to the training code:

<IsaacSDK>/apps/samples/ball_segmentation/training.py
../../_images/ball_seg_inf.png

Fig. 3 Ball Segmentation Inference Pipeline

To Run Inference for Ball Segmentation

  1. Connect a ZED camera to the PC.

  2. Check the following configuration file and make sure model_file_path is pointing to the latest trained frozen model from the training application (/tmp/ball_segmentation/ckpts/model-0-frozen.pb, in this example).

    <IsaacSDK>/apps/samples/ball_segmentation/inference.config.json
    
  3. Start the ball segmentation inference application with the following command:

    bazel run ./apps/samples/ball_segmentation:inference
    
  4. Check Sight to see the inference results along with the feed from the input camera.

To Run TensorRT Inference for Ball Segmentation

  1. Start the ball segmentation inference application with the following command:

    bazel run apps/samples/ball_segmentation:inference_tensorrt
    
  2. Check Sight to see the inference results along with the input image.

Mosaic

Mosaic is a sample application that visualizes how simulation data is queued before being consumed. It includes simulation generating machine-learning training data and an application that receives, queues and visualizes them in a mosaic pattern via Sight.

To start the visualization:

  1. Launch Isaac Sim with the following command and add the absolute path for the JSON file.

    ./Engine/Binaries/Linux/UE4Editor IsaacSimProject CarterWarehouse_P -vulkan -isaac_sim_config_json="<IsaacSDK>/apps/samples/mosaic/bridge_config/mosaic_sim.json"
    
  2. In Isaac SDK, open the JSON file used above in editor and add the absolute paths for the following graph and config files.

    <IsaacSDK>/apps/samples/mosaic/bridge_config/mosaic_sim.graph.json
    <IsaacSDK>/apps/samples/mosaic/bridge_config/mosaic_sim.config.json
    
  3. Start the training pipeline with the following command.

    bazel run apps/samples/mosaic
    
  4. Open Sight in your browser at http://localhost:3000 and a grid like following should be presented.

    ../../_images/mosaic.png

    Fig. 4 Data Generation Mosaic Visualization

PyCodelet

Many ML developers are fluent in Python. The PyCodelet facilitates data transfer to and from the Isaac SDK in Python.

<IsaacSDK>/apps/engine/pyalice/tests/pycodelet_test.py

Instead of coding Codelet in C++, ML developers could code in Python with PyCodelet. To start with, one need to declare the Python-based codelet as sub-class of alice.Codelet.

class MyPyCodeletProducer(Codelet):

Just like C++ based Codelet, developer need to override 3 member functions: start(), tick() and stop(). tick would be invoked periodically upon message arrival or timeout, while start() and stop() would be invoked when the Codelet enters or exits running status.

Though Python Codelet shares similar concept with C++ Codelet, there are some minor differences:

  • Python Codelet needs to retrieve hook for messages explicitly via isaac_proto_tx() and isaac_proto_rx().
  • Python Codelet needs to retrieve message via get_proto() and create message via init_proto() explicitly from hook.
  • To facilitate Python Codelet in application, the node for Python Codelet is created with following JSON specification via alice.loadGraphFromFile().
{
  "name": "foo_node",
  "components": [
    {
      "name": "ml",
      "type": "isaac::alice::MessageLedger"
    },
    {
      "name": "isaac.alice.PyCodelet",
      "type": "isaac::alice::PyCodelet"
    }
  ]
}

alice.register_pycodelets() is invoked explicitly later to bind Python Codelet to these nodes using mapping like the one presented below.

{
  "producer": MyPyCodeletProducer,
  "consumer": MyPyCodeletConsumer
}

Python codelets have access to parameters via JSON. Check python_ping for sample.

Supporting Code

TensorflowInference Codelet

The TensorflowInference codelet takes a trained Tensorflow frozen model and runs inference in the Isaac SDK application. The input and output messages are both TensorList which is a list of TensorProto messages. The codelet takes several parameters as shown by the configuration for the ball segmentation inference application:

"model_file_path": "/tmp/ball_segmentation/ckpts/model-0-frozen.pb",
"config_file_path": "",
"input_tensor_info": [
  {
    "ops_name": "input",
    "index": 0,
    "dims": [1, 256, 512, 3]
  }
],
"output_tensor_info": [
  {
    "ops_name": "output",
    "index": 0,
    "dims": [1, 256, 512, 1]
  }
]
  • model_file_path - Points to the Tensorflow frozen model to run in the application. For more information about the frozen model, refer to TensorFlow NVIDIA GPU-Accelerated container and Tensorflow tool.
  • config_file_path - Points to a protobuf file that contains a Tensorflow ConfigProto object for configuring Tensorflow runtime. Use the following command as a starting point for customization.
python -c "import tensorflow as tf; f=open('config_proto.pb', 'w');f.write(tf.ConfigProto(allow_soft_placement=True, gpu_options=tf.GPUOptions(per_process_gpu_memory_fraction=0.5)).SerializeToString()); f.close()"
  • input_tensor_info - List of input tensor specifications. Each specification includes the operation name (as in NodeDef), the index of an input within the operation, and tensor dimensions.

    For example, the following code specifies three input tensors:

    "input_tensor_info": [
      {
        "ops_name": "input_color",
        "dims": [1, 3, 256, 512],
        "index": 0
      },
      {
        "ops_name": "input_color",
        "dims": [1, 3, 256, 512]
        "index": 1
      },
      {
        "ops_name": "input_depth",
        "dims": [1, 1, 256, 512]
        "index": 0
      }
    ]
    
  • output_tensor_info - List of output tensor specifications. Each specification includes the operation name (as in NodeDef), the index of an output within the operation, and tensor dimensions.

    For example, the following code specifies output tensors of an operation:

    "output_tensor_info": [
      {
        "ops_name": "Predictions/Reshape",
        "dims": [1, 1001],
        "index": 0
      }
    ]
    

TensorRT Inference Codelet

The TensorRT Inference codelet takes a TensorRT .plan, .uff, or .onnx model and runs inference in the Isaac SDK application on the GPU. The input and output messages are both TensorList, which is a list of TensorProto messages.

TensorRT Inference takes following parameters:

  • model_file_path. - A path to a device-agnostic model in UFF or ONNX format. The UFF model can be created by the UFF Converter tool, or converted from Tensorflow GraphDef, Frozen Protobuf Model, or a Keras Model. Sample code and Bazel build step that utilizes this tool is available at packages/ml/tools. The ONNX format is an open format and is supported by Caffe2, MXNet, and PyTorch, with export to ONNX available for all major frameworks. See also: ONNX tutorials.

    Note

    The model file is ignored if the device-specific serialized engine file is found at a default or specified location (see below).

  • model_file_path (optional) - A path to a device-specific serialized TensorRT engine. The engine can be created automatically by the Isaac TensorRT codelet or from the model upon first start on the target device. This conversion process and optimization for the target device may take from several seconds to to several minutes, depending on the size of the model and the system’s performance. Alternatively, the engine can be pre-cached on the device.

    If this parameter is not specified, the default is set to Model File Path, with the extension replaced to .plan.

    Note

    The engine file, if it exists, takes precedence over the Model File.

    Note

    The engine is not portable between different devices, different versions of TensorRT, or different versions of the CuDNN library.

  • input_tensor_info - List of input tensor specifications. Each specification includes the operation name (as in NodeDef), tensor dimensions, and optional parameters.

    To give an example, the following code specifies two input tensors:

    "input_tensor_info": [
      {
        "operation_name": "input_1",
        "dims": [1, 3, 256, 512]
      },
      {
        "operation_name": "input_2",
        "dims": [1, 3, 256, 512]
      }
    ]
    

    Input Tensor Specification Parameters

    • operation_name - The name of the operation to look up in the model graph.

    • dims - The tensor dimensions. Note: In the TensorRT documentation, individual tensor dimensions may be referred by the following names:

      "dims": [Batch Size, Channel, Rows, Columns]
      
      • Batch Size (Optional, Default = 1) - Number of samples in a batch.

        Note

        To specify a variable batch size, set this parameter to -1. The maximum batch size parameter (see below) must also be set.

      • Channels - Number of image channels or components of a vector.

      • Rows (Optional) - Number of rows of a matrix (or height of an image).

      • Columns (Optional) - Number of columns of a matrix (or width of an image).

    • uff_input_order (optional) - Original framework input order (or weights memory layout).

      "uff_input_order": "channels_last",
      

      The UFF Input Order parameter must be set to the input order for which the model weights are stored in the UFF file. Normally, it is the same as the order of input data at the training time.

      TensorRT supports following UFF Input Order (or weights memory layout) formats:
      • channels_last - With the Channel as the last index of the tensor. This is referred by TensorRT documentation as ‘NHWC’ layout (Batch Number, Height, Width, Channel). For example, [1, 480, 640, 3];
      • channels_first - With the Channel as the first (or second) index of the tensor. This is referred by TensorRT documentation as ‘NCHW’ layout (Batch Number, Channel, Height, Width). For example, [1, 3, 480, 640];
      The UFF Input Order parameter can be omitted for vector data layout. This is

      referred by TensorRT documentation as ‘NC’ layout (Batch Number, Channel). For example, [1, 1024]

      To give an example, assume a TensorFlow/Keras model is trained with the default input order of channels_last, on 320x200x3 (RGB) images. The correct setting for such model is:

      "dims": [1, 3, 200, 320]
      "uff_input_order": "channels_last",
      

      Note

      In this example, the input order to TensorRT is ‘channels first’, while the model was originally trained with ‘channels last’. Note also that the ‘Batch Size’ parameter is set to 1 and can be omitted in that example.

This list of Input Tensor Specifications is used by the Model Parser to cutout

a part of the graph used for inference, to set the dimensions of variable-size inputs and to perform memory allocations. It is also used at inference time, to validate the rank and size of the input tensor.

The Input Tensor Specification should comply with:
  • Output of the previous node in the Isaac graph. Tensor ranks and dimensions of the Input Tensor Specification should match that of the output of the previous node. The Batch Size of the output of the previous node should be smaller or equal to the maximum batch size of the engine.

    If the Batch Size is not set, the previous node output should omit it also:

    "input_tensor_info": [
      {
        "dims": [3, 256, 512]
    ...
    
    "isaac.ml.TensorReshape": {
      "output_tensors_dimension": [[3, 256, 512]]
    
  • The specification of the Trained Model being parsed. The model should contain matching nodes with the sizes either fixed to match Input Tensor Specification or variable sizes (-1).

  • Limitations from TensorRT. TensorRT currently only supports following input memory layouts:

    • (Batch Size, Channel, Rows, Columns), for example [1, 3, 480, 640];
    • (Channel, Rows, Columns), for example [3, 480, 640];
    • (Batch Size, Channel), for example [1, 1024];
    • (Channel), for example [1024].

    Note

    TensorRT input order for images is color planes and not a mixed color channel array (i.e. not RGB). Normally, it requires conversion of images into the color planes format or a transposition of the channel axis. This could be achieved with selecting relevant image encoding order, for example:

    "tensor_encoder": {
      "isaac.ml.ColorCameraEncoder": {
        "rows": 480,
        "cols": 640,
        "pixel_normalization_mode": "PositiveNegative",
        "tensor_index_order": "201"
      }
    
  • output_tensor_info - list of output tensor specifications. Each specification includes operation name (as in NodeDef) and tensor dimensions.

    For example, following code specifies one output tensor:

    "output_tensor_info": [
      {
        "operation_name": "output",
        "dims": [1, 1001]
      }
    ]
    

    See also input tensor specifications above.

  • max_batch_size (optional) - The batch size for which the engine will be tuned. At execution time, smaller batches may be used, but not larger. The default is set to ‘-1’ which specifies that the input tensor size will be used to infer the maximum batch size. Note: this parameter affects the amounts of GPU memory allocations and engine performance.

    The input and output tensor specifications should have the same batch size. This batch size should be smaller or equal than the Maximum Batch Size of the model.

    If the batch size is equal to 1, this dimension can be retracted, for example for:

    "dims": [1, 256, 512, 1]
    

    the batch size is equal to 1 and the first dimension could be retracted:

    "dims": [256, 512, 1]
    

    This allows to avoid a TensorReshape operation.

    If the maximum batch size is set, this dimension can also be set to -1, for example:

    "dims": [-1, 256, 512, 1]
    

    in that case the dimension will be set at a runtime. This enables variable batch size support.

    Note

    The maximum batch size is used at the engine optimization step and for optimal performance it is recommended to set it to the actual value used at the inference time.

  • max_workspace_size (optional) - The temporary GPU memory size for which the engine will be tuned. Layer algorithms often require temporary workspace. This parameter limits the maximum size that any layer in the network can use. If insufficient scratch is provided, it is possible that TensorRT may not be able to find an implementation for a given layer. .. note:: This parameter affects the amounts of GPU memory allocated and engine performance.

  • inference_mode (optional) - Set whether or not 8-bit and 16-bit kernels are permitted. - Float16 (default) - during engine build fp16 kernels will be tried, when this mode is enabled. - Float32 - during engine build only fp32 kernels are allowed.

  • device_type (optional) - Set default device that this layer/network will execute on, GPU or DLA. - GPU (default) - during engine build GPU will be set as a default device. - DLA - during engine build DLA engine will be used as a default device.

  • allow_gpu_fallback (optional) - Allow fallback to GPU, if this layer/network can’t be executed on DLA.

  • force_engine_update (optional) - Force update of the CUDA engine, even if input or cached .plan file is present. Debug feature.

  • plugins_lib_namespace (optional) - Initialize and register all the existing TensorRT plugins to the Plugin Registry with an optional namespace. To enable plugins, set the plugins_lib_namespace parameter. An empty string is a valid value for this parameter and it specifies the default TensorRT namespace:

    "plugins_lib_namespace": "",
    

    Note

    The function that enables access to the Plugin Registry (initLibNvInferPlugins) should only be called once. To prevent calling this function from multiple instances of TensorRT Inference codelet, only include the Plugins Namespace parameter for a single codelet instance.

  • verbose (optional) - Enables verbose log output. This option enables logging of DNN optimization progress. It is disabled by default to increase log file usability at the default setting. Debug feature.

Example for the ball segmentation inference application configuration:

"model_file_path": "external/ball_segmentation_model/model-9000-trimmed.uff",
"engine_file_path": "external/ball_segmentation_model/model-9000-trimmed.plan",
"input_tensor_info": [
  {
    "operation_name": "input",
    "uff_input_order": "channels_last",
    "dims": [1, 3, 256, 512]
  }
],
"output_tensor_info": [
  {
    "operation_name": "output",
    "dims": [1, 256, 512, 1]
  }

SampleAccumulator Codelet

The SampleAccumulator codelet is a component designed to buffer synthesized data from the simulator. Using Python binding, SampleAccumulator can serve as a tensorflow dataset for training ML models.

SampleAccumulator takes one parameter, the maximum samples to hold in the buffer.

"sample_buffer_size": 500

MosaicViewer Codelet

The MosaicViewer codelet visualizes simulation data queued in a SampleAccumulator instance. It searches the parent node for the SampleAccumulator instance and visualizes its queue buffer.

MosaicViewer takes following parameters:

  • Grid Size: An array of 2 positive integers that specifies how many images would be stacked across height and width.
  • Mosaic Size: An array of 2 positive integers that specifies the height and width in pixels of generated visualization image.
  • Tick Period: The visualization update frequency.
"mosaic_samples": {
  "isaac.viewers.MosaicViewer": {
    "grid_size": [8, 8],
    "mosaic_size": [1080, 1920],
    "tick_period": "100ms"
  },
  "isaac.ml.SampleAccumulator": {
    "sample_buffer_size": 64
  }
},

Tensors

In the Isaac SDK, Tensor data is stored and passed as messages of TensorProto, which is the counterpart of numpy ndarray data used in Tensorflow. Conversion is needed for ML to accommodate other data formats like image. Refer to IsaacSDK/packages/ml/ColorCameraEncoderCpu.cpp for an example.