Machine Learning Workflow

Machine Learning (ML) Workflow is the collection of code and samples intended to speed up adoption of ML with the Isaac SDK. These samples use Tensorflow framework for training, but the same principles and code should also work with other ML frameworks like PyTorch.

Training with Simulation

Training data is hard to collect and harder to label. Incorporating synthesized data from simulation can accelerate the process. Ball Segmentation describes training a sample ML model from synthesized data.

Inference on PC and Edge Devices

Two possible runtimes are provided in the Isaac SDK to perform inference with trained ML models on both Jetson TX2/Xavier and PC. All samples can be compiled and run on both plaforms.

Tensorflow

Tensorflow is a popular ML framework from Google which is used for training in the samples presented here. The Isaac SDK also works with the Tensorflow runtime to perform inference with the trained model as-is. See the Ball Segmentation section of this documentation for a sample.

Note that Tensorflow requires non-trivial amount of resources that may result in system strain and take some time to load on edge devices with limited resources.

TensorRT

TensorRT is a deep learning inference optimization tool and runtime from NVIDIA. It is designed to deliver low latency and high throughput. Stereo Depth DNN presents a sample application performing inference with TensorRT.

For more information about working with TensorRT, please refer to TensorRT Sample.

Samples

Developing, training and tuning deep learning models requires massive amounts of data and computational power. Such heavy lifting tasks are expected to be performed by storage and GPUs in the cloud or in computing clusters, while the real robot applications run on edge devices with limited computational power.

A smooth workflow from data collection all the way to deep learning model deployment into robots accelerates robot application development.

Ball Segmentation

Ball Segmentation is a sample demonstrating Machine Learning workflow in Isaac SDK. It includes training a ML model with simulation and deploying the model directly to either a PC or Jetson TX2/Xavier device.

The Ball segmentation inference application is not supported on Jetson Nano because of memory limitations.

../../_images/ball_seg_train.png

Fig. 1 Ball Segmentation Training Pipeline

As shown in Ball Segmentation Training Pipeline, the ML model needed for ball segmentation is trained using simulated data.

To train the model locally with simulation

  1. Start Isaac Sim with the following command:

    ./Engine/Binaries/Linux/UE4Editor IsaacSimProject CarterWarehouse_P -vulkan
    
  2. In the simulator, open Content/Carter/IsaacSimGameModeBase and add the absolute path for the file to Isaac Sim/JSONConfig Path.

    IsaacSDK/apps/samples/ball_segmentation/bridge_config/training_sim.json
    
    ../../_images/sim_gamemodebase.png

    Fig. 2 Finding Game Mode Base in Isaac Simulator

  3. In Isaac SDK, open the file mentioned above in editor and fill in absolute paths for the following files.

    IsaacSDK/apps/samples/ball_segmentation/bridge_config/training_sim.graph.json
    IsaacSDK/apps/samples/ball_segmentation/bridge_config/training_sim.config.json
    
  4. Click play to start the simulator.

    ../../_images/sim_play.png

    Fig. 3 Start Isaac Simulator

  5. Start the training pipeline with the following command.

    bazel run apps/samples/ball_segmentation/training
    
  1. (Optional) To monitor training process, start Tensorboard with the following command:

    CUDA_VISIBLE_DEVICES=-1 tensorboard --logdir=/tmp/ball_segmentation/logs
    

    Or open http://localhost:3000 in a web browser. ball_train shows possible output.

    ../../_images/ball_seg_train_sight.png

    Ball Segmentation Training App output on Sight.

The trained model is stored in the following path by default.

/tmp/ball_navigation_training/logs

To use a different path, attach parameter of -- --train_logdir in step 5. For more parameters, please refer to the training code:

IsaacSDK/apps/ball_navigation/training.py
../../_images/ball_seg_inf.png

Fig. 5 Ball Segmentation Inference Pipeline

To Run Inference for Ball Segmentation

  1. Connect a ZED camera to the PC.

  2. Check the following configuration file and make sure model_file_path is pointing to the latest trained frozen model from the training application (/tmp/ball_segmentation/ckpts/model-0-frozen.pb, in this example).

    IsaacSDK/apps/samples/ball_segmentation/inference.config.json
    
  3. Start the ball segmentation inference application with the following command:

    bazel run ./apps/samples/ball_segmentation:inference
    
  4. Check Sight to see the inference results along with the feed from the input camera.

Mosaic

Mosaic is a sample application that visualizes how simulation data is queued before being consumed. It includes simulation generating machine-learning training data and an application that receives, queues and visualizes them in a mosaic pattern via Sight.

To start the visualization:

  1. Start Isaac SDK Simulator with the following command.

    ./Engine/Binaries/Linux/UE4Editor IsaacSimProject CarterWarehouse_P -vulkan
    
  2. In the simulator, open Content/Carter/IsaacSimGameModeBase and add the absolute path for the file to the Isaac Sim JSONConfig Path.

    IsaacSDK/apps/samples/mosaic/bridge_config/mosaic_sim.json
    
  3. In the Isaac SDK, open the file mentioned above in an editor and fill in absolute paths for the following files.

    IsaacSDK/apps/samples/mosaic/bridge_config/mosaic_sim.graph.json
    IsaacSDK/apps/samples/mosaic/bridge_config/mosaic_sim.config.json
    
  4. Click play to start the simulator.

  5. Start the training pipeline with the following command.

    bazel run apps/samples/mosaic
    
  6. Open Sight in your browser at http://localhost:3000 and a grid like following should be presented.

    ../../_images/mosaic.png

    Fig. 6 Data Generation Mosaic Visualization

PyCodelet

Many ML developers are fluent in Python. The PyCodelet facilitates data transfer to and from the Isaac SDK in Python.

IsaacSDK/apps/pyengine/alice/tests/pycodelet_test.py

Instead of coding Codelet in C++, ML developers could code in Python with PyCodelet. To start with, one need to declare the Python-based codelet as sub-class of alice.Codelet.

class MyPyCodeletProducer(Codelet):

Just like C++ based Codelet, developer need to override 3 member functions: start(), tick() and stop(). tick would be invoked periodically upon message arrival or timeout, while start() and stop() would be invoked when the Codelet enters or exits running status.

Though Python Codelet shares similar concept with C++ Codelet, there are some minor differences:

  • Python Codelet needs to retrieve hook for messages explicitly via isaac_proto_tx() and isaac_proto_rx().
  • Python Codelet needs to retrieve message via get_proto() and create message via init_proto() explicitly from hook.
  • To facilitate Python Codelet in application, the node for Python Codelet is created with following JSON specification via alice.loadGraphFromFile().
{
  "name": "foo_node",
  "components": [
    {
      "name": "ml",
      "type": "isaac::alice::MessageLedger"
    },
    {
      "name": "isaac.alice.PyCodelet",
      "type": "isaac::alice::PyCodelet"
    }
  ]
}

alice.register_pycodelets() is invoked explicitly later to bind Python Codelet to these nodes using mapping like the one presented below.

{
  "producer": MyPyCodeletProducer,
  "consumer": MyPyCodeletConsumer
}

Supporting Code

TensorflowInference Codelet

The TensorflowInference codelet takes a trained Tensorflow frozen model and runs inference in the Isaac SDK application. The input and output messages are both TensorList which is a list of TensorProto messages. The codelet takes several parameters as shown by the configuration for the ball segmentation inference application:

"model_file_path": "/tmp/ball_segmentation/ckpts/model-0-frozen.pb",
"config_file_path": "",
"input_tensor_info": [
  {
    "ops_name": "input",
    "index": 0,
    "dims": [1, 256, 512, 3]
  }
],
"output_tensor_info": [
  {
    "ops_name": "output",
    "index": 0,
    "dims": [1, 256, 512, 1]
  }
]
  • Model File Path. This points to the Tensorflow frozen model to be run in the application. For more information about the frozen model, refer to TensorFlow NVIDIA GPU-Accelerated container and Tensorflow tool.
  • Config File Path. This points to a protobuf file containing a Tensorflow ConfigProto object for configuring Tensorflow runtime. Use the following command as a starting point for customization.
python -c "import tensorflow as tf; f=open('config.txt', 'w');f.write(tf.ConfigProto(allow_soft_placement=True, gpu_options=tf.GPUOptions(per_process_gpu_memory_fraction=0.5)).SerializeToString()); f.close()"
  • List of Input Tensor Specifications. Each specification includes operator name (as in NodeDef), tensor index and tensor dimensions.
  • List of Output Tensor Specifications.

SampleAccumulator Codelet

The SampleAccumulator codelet is a component designed to buffer synthesized data from the simulator. Using Python binding, SampleAccumulator can serve as a tensorflow dataset for training ML models.

SampleAccumulator takes one parameter, the maximum samples to hold in the buffer.

"sample_buffer_size": 500

MosaicViewer Codelet

The MosaicViewer codelet visualizes simulation data queued in a SampleAccumulator instance. It searches the parent node for the SampleAccumulator instance and visualizes its queue buffer.

MosaicViewer takes following parameters:

  • Grid Size: An array of 2 positive integers that specifies how many images would be stacked across height and width.
  • Mosaic Size: An array of 2 positive integers that specifies the height and width in pixels of generated visualization image.
  • Tick Period: The visualization update frequency.
"mosaic_samples": {
  "isaac.viewers.MosaicViewer": {
    "grid_size": [8, 8],
    "mosaic_size": [1080, 1920],
    "tick_period": "100ms"
  },
  "isaac.ml.SampleAccumulator": {
    "sample_buffer_size": 64
  }
},

Tensors

In the Isaac SDK, Tensor data is stored and passed as messages of TensorProto, which is the counterpart of numpy ndarray data used in Tensorflow. Conversion is needed for ML to accommodate other data formats like image. Refer to IsaacSDK/packages/ml/ColorCameraEncoder.cpp for an example.