Free Space Segmentation

This application helps create a model that allows the robot to perform real-time path segmentation using only monocular camera feed. Essentially, given an image, the model should be able to detect free space or traversable space from obstacles or non-traversable space.

The costmap or obstacle map for the robot’s environment, can be created from various sources such as Lidar or depth information from the camera. Fusing information from different sensors to better fine-tune the costmap can help make the robot’s obstacle-avoidance more robust. Ideally, the free space determined by the path segmentation model could be projected onto the real world coordinate system and can be used as input information for obstacle avoidance.

Data Collection

Data can be collected from public data sets, or generated in simulation.

Public datasets

Datasets like MSCoco and ADE20K provide per-pixel segmentation data for multiple classes. Relevant classes for path segmentation include pavement, road, carpet, earth, ground, and multiple types of floors. The images are captured in various environments with multiple camera angles and degrees of occlusions, making them good candidates for training a model with better generalization.


Being able to generate unlimited data points through simulation is a powerful asset. Bridging the “reality-gap” that separates simulated robotics from real experiments may vastly improve the performance of the task in reality. Simulators offer a variety of features which make this possible, such as domain randomization and teleportation.

Domain randomization attempts to bridge the reality gap through improved availability of unbiased data. Domain-randomized training data helps make the model more robust in responding to different lighting conditions, floor textures, and random objects in the field of view during inference.

Domain randomization can be achieved in several ways:

  • Light randomization: Change the color and intensity of lights
  • Material randomization: Apply different substance materials over the desired surfaces
  • Texture randomization: Apply different textures to the materials
  • Color randomization: Apply different colors to the materials
  • Material properties: Vary material properties such as roughness, metallicity and specularity. This can change the friction, the reflective and refractive properties of the surface, and other characteristics.

Teleportation functionality enables you to randomly sample camera poses within a certain range (translation and rotation) to capture data from different heights and angles.

The Isaac SDK supports two options for simulation: NavSim (which uses the Unity game engine) and Isaac Sim (which uses the Unreal game engine).

Setting Up Communication With the Simulator

The Isaac SDK and simulator communicate using a Pub-Sub architecture. Data is passed back and forth between the two processes by setting up TCP publishers on the side where the data is created, and setting up TCP subscribers on the side where the data needs to be ingested.

  • For NavSim, the application that publishes the ground truth data is packages/navsim/apps/ This is directly loaded by the scene rng_warehouse in NavSim. The application publishes the sensor data to a user-defined port using a TcpPublisher. This data is used by the training application. The training application in turn sends teleportation commands to the NavSim application, which is received through a TcpSubscriber node. In the unity scene, domain randomization using Substance is available by default. This allows you to apply different materials on the meshes in the scene, apply random poses to the actors, etc. The ground is labeled as “floor” with a per pixel index of 1.

  • Through IsaacSim: Communication is set up through the files in packages/freespace_dnn/apps/training/bridge_config. The graph JSON file defines the components that need to set up on the simulator side. This includes the actors, subscribers, and publishers. The config JSON file denotes the configuration of each of these components. This includes the IP and port to which the publisher publishes data, port from which the subscriber accepts data, initial spawn location of actor in the map, etc.

    The path to the JSON file containing the path to the graph and config file must be passed to IsaacSim. This can be found in packages/freespace_dnn/apps/training/bridge_config/training_sim.json.

Autonomous Data Collection in Real Environments

Autonomous data collection using the robot can be a great way to introduce training data under real conditions. This can be broadly split into 3 workflows:

  • Path planning through the map
  • Monitoring robot displacement
  • Ground truth creation

Path Planning Through the Map

  • TravellingSalesman: This codelet plots waypoints over the freely traversable space in the map, and calculates the shortest path. Each waypoint denotes a 2D point in the map.


The travelling salesmen path reflects only the graph path through the waypoints. It does not take into account the reachability of space and may draw paths over unreachable areas when visualized.

  • MoveAndScan: This takes a list of 2D waypoints as input and expands them to include multiple orientations. The number of orientations included for each 2D location is user-defined. Hence, if there are N waypoints in the map and M orientations, the output is a list of NxM poses.
  • FollowPath: This takes a list of poses as input and publishes each pose (or waypoint) to the GoTo codelet as a goal. This enables the robot to move to each of the waypoints in order.

Monitoring Robot Displacement

  • NavigationMonitor: This codelet continously monitors the linear and angular displacement of the robot. If the displacement is greater than a user-defined threshold, it publishes a RobotStateProto message which contains the current pose, current speed, and displacement since last update. In this context, the NavigationMonitor codelet mainly acts as a signal to regulate when a pair of proto messages can get logged using the Recorder functonality.
  • Throttle: This codelet regulates one signal with respect to another, in this case, it regulates the camera input with respect to the RobotStateProto output from NavigationMonitor. The main purpose of using the Throttle component is to make sure that the ground thruth data is collected at intervals, so that the log size does not get too inflated in a very short period of time.

Ground Truth Creation

  • RgbdSuperpixels: This computes superpixel clustering for an RGB-D image, using a single-pass clustering algorithm which assigns every pixel to a local cluster based on similarity in color and depth.
  • RgbdSuperpixelFreespace: This component labelas every superpixel as either free space or an obstacle. The superpixels are transformed into the ground coordinate frame assuming that the ground plane conforms to the equation Z = 0.
  • SuperpixelImageLabelling: This component creates a pixel-wise segmentation of the original camera based on the superpixel labeling.

Deploying and Running the Application

The application can be deployed by running the following command:

bob@desktop:~/isaac$ ./engine/build/ -h <robot_ip> -p //packages/freespace_dnn/apps:autonomous_data_collection-pkg -d jetpack42 --remote-user <username_on_nano>

where <robot_ip> is the IP address of the robot and <username_on_robot> is your username on the

.. note:: If a username is not specified with the --remote-user option, the default username used is :code:`nvidia`.

After deployment, run the application with the following steps:

  1. Log onto the robot (via SSH).

  2. Navigate to the directory in which the app was deployed. This is normally ~/deploy/<user>.

  3. Run the application with the following command:

    ./packages/freespace_dnn/apps/autonomous_data_collection -r <robot ID> -m <map name>

The robot should plot a waypoint graph over the map, navigate to each point and turn a complete circle. NavigationMonitor monitors the displacement and enables logging only at certain intervals. The end result is a log containing paired data for free space segmentation, namely the color image and corresponding segmented image.

Network Architecture

The best suited network for binary segmentation task would ideally satisfy the following criteria:

  • Easily trainable on a small dataset
  • Able to train fast, for a short inference time
  • Supports for TensorRT inference
  • Has a compatible licence, so that it is fully integratable into Isaac SDK

The network that best meets those criteria is U-Net. This is an end-to-end fully convolutional network (FCN), i.e. it only contains convolutional layers and no dense layers.

Training the Network


Multi-GPU training

On multi-GPU host systems, parallelizing the workload on all GPUs can be a powerful functionality. Parallelism in Tensorflow can be divided into two types:

  • Data parallelism: Data is distributed across multiple GPUs or host machines.
  • Model parallelism: The model itself can be split across multiple machine or GPUs. For example, a single layer can be fit into the memory of a single machine (or GPU) and forward and back propagation involves communication of output from one host (or GPU) to another.

Tensorflow supports data parallelism through the MirroredStrategyModule library, which mirrors the model graph on each of the GPU’s and can hence accept independent sets of data on each GPU for training.

Message Types

Messages are of the following types:

  • TensorProto: Defines an n-dimensional tensor that forms the basis of images and tensor pairs for training.
  • TensorListProto: Defines a list of TensorProto. TensorProto messages are mainly used to pass around tensors.
  • ImageProto: A spcial case of TensorProto for tensors limited to three dimensions.
  • ColorCameraProto: Holds a color image, and camera intrinsic information.
  • SegmentationCameraProto: Holds an image containing the class label for every pixel in the image. It also contains the camera intrinsic information, similar to the ColorCameraProto.


  • TcpSubscriber: Used by the training application to receive data from simulator. Two TcpSubscribers are used in this example, each receiving a color image and detection label from the simulation.
  • ColorCameraEncoderCpu: Takes in ColorCameraProto and outputs a downsampled image stored in a 3D tensor (WxHx3). The tensor is published as a TensorListProto containing only one tensor. The codelet also supports downsampling which reduces the image to a smaller user-defined size.
  • SegmentationEncoder: Takes in a SegmentationCameraProto and outputs a 3D tensor (WxHx1). This codelet is responsible for encoding the ground truth data for binary semantic segmentation, by assigning the probability of 1.0 to the pixel positions of the class in considerationand 0.0 to all other pixel positions. The tensor is published as a TensorListProto containing only one tensor.
  • TensorSynchronization: Takes in two TensorListProto inputs and synchronizes them according to their acquisition time. This codelet makes sure that the training code gets synchronized color image and segmentation label data.
  • SampleAccumulator: Takes in the training pairs (image tensor and segmentation label tensor) as a TensorListProto and stores that in a buffer. This codelet is bound to the python script such that the training script can directly sample from this buffer using the acquire_samples() function. The acquire_samples() function converts the TensorListProto into a list of numpy arrays with corresponding dimensions and passes that to Python.
  • Teleportation: Publishes RigidBody3GroupProto in a pre-defined way to randomly change the spawn location.



Message Types

Messages are of the following types:

  • TensorProto: Defines an n-dimensional tensor that forms the basis of images and tensor pairs for training.
  • TensorListProto: Defines a list of TensorProto. TensorProto messages are mainly used to pass around tensors.
  • ImageProto: A special case of TensorProto for tensors limited to three dimensions.
  • ColorCameraProto: Holds a color image, including the camera intrinsic information.
  • SegmentationPredictionProto: Holds a 3D tensor containing the segmentation output from the Neural Network (W x H x Number of classes).


  • ZedCamera: Enables data ingestion from the ZED Camera. Encodes the raw image as a ColorCameraProto message containing the RGB image and camera intrinsic information.
  • ColorCameraEncoderCpu: Takes in ColorCameraProto and outputs a downsampled image stored in a 3D tensor (WxHx3). The tensor is published as a TensorListProto containing only one tensor. The codelet also supports downsampling which reduces the image to a smaller user-defined size.
  • TensorReshape: Takes a TensorListProto as input and reshapes each TensorProto according to the user-defined size. In this context, this codelet is mainly used to add an extra dimension to the input tensor depicting the batch size, since Tensorflow accepts input in the NHWC format. Consequently, the codelet is also used to remove the first dimension from the output of the neural network.
  • TensorflowInference: Loads the frozen neural network into memory and takes in a TensorListProto as input to pass to the netowork. Publishes the network output in the form of a TensorListProto.
  • SegmentationDecoder: Takes in the network inference in the form of a TensorListProto, extracts the output tensor from it and publishes it in the form of a SegmentationPredictionProto, which contains both the output tensor as well as the list of class names that have been encoded.
  • BinarizePredictions: Binarizes the inference tensor based on a user-defined threshold. All values greater than or equal to this threshold are set to 1.0 and values less than the threshold are set to 0.0. Ingests a SegmentationPredictionProto as input and publishes the same message type as output.
  • VisualizeSegmentation: This codelet is purely used for visualization. It overlays the segmentation output on the de-normalized output of the ColorCameraEncoder and publishes the channels to Sight.
  • EvaluateSegmentation: This codelet aids in evaluating the output of the network. Given the ground truth and segmentation predictions as input, it computes several metrics including pixel accuracy and intersection over union score and publishes the data to Sight.


The training application can be run using the following command:

bob@desktop:~/isaac$ bazel run packages/freespace_dnn/apps/training:training

The training configurations can be found in packages/freespace_dnn/apps/training/segmentation_training_config.json

The logs and checkpoints are stored in /tmp/path_segmentation by default, but this path can be changed in the configuration JSON file. To view the training progress on Tensorboard, please run the following command on the terminal:

tensorboard --logdir=/tmp/path_segmentation

The visualization interface can be accessed at http://localhost:6006.

To use NavSim data for training, use the following steps:

  1. Run the rng_warehouse scene, which initializes the TcpPublishers for training data, as mentioned in the Simulation section.
  2. Make sure that the value of the key app_filename is set to packages/freespace_dnn/apps/training/ in packages/freespace_dnn/apps/training/segmentation_training_config.json. This is the default value in the config file.
  3. In the configuration for segmentation_encoder in packages/freespace_dnn/apps/training/freespace_dnn_training.subgraph.json, make sure that the key offset is set to the value 0. This is set by default.

To use IsaacSim data for training, use the following steps:

  1. Run IsaacSim with the path to the JSON file as packages/freespace_dnn/apps/training/bridge_config/training_sim.json
  2. Set the value of the key app_filename to packages/freespace_dnn/apps/training/ in packages/freespace_dnn/apps/training/segmentation_training_config.json.
  3. In the configuration for segmentation_encoder in packages/freespace_dnn/apps/training/freespace_dnn_training.subgraph.json, set the value of the key offset to 1.

Run the training application with the same command as mentioned above.

Freezing the Model

The model checkpoints are stored at periodic intervals during training. The checkpoints are stored in the form of three files:

  • .meta file: Denotes the graph structure of the model
  • .data file: Stores the values of all the variables saved
  • .index file: Stores the list of variable names and shapes

Once the training is complete, serialize the most recent checkpoint as a protobuf file with the following command:

bob@desktop:~/isaac$ python --checkpoint_dir <path to checkpoint directory>
       --output_nodename <name of output node> --output_filename <name of frozen pb file>


Run the inference application with the following command:

bazel run packages/freespace_dnn/apps:path_segmentation_inference

The application takes a monocular camera feed from a ZED camera as input and outputs a per-pixel probability of the pixel being free space, 0.0 being the lowest value and 1.0 being the highest value.

The default inference application contains nodes to support a Segway and a joystick. Inference can also run without these hardware devices, using just a ZED camera, but there will be errors in logs reporting that the Segway is missing.

Sample inference

../../../_images/inference_sample_1.png ../../../_images/inference_sample_2.png