Cart Delivery in the Factory of the Future

The cart delivery application in the Factory of the Future (FoF) scene is powered by various Isaac capabilities and features:

  • A high-quality, realistic simulation environment with realistic physics

  • A behavior tree defining complex application flow

  • Object detection and 3D pose estimation, fully trained in simulation

  • A choice between driving on the right side of the lane or taking the shortest path

  • Multi-lidar localization and obstacle avoidance

Running the simulator

Open a terminal window, navigate to the folder containing the extracted IsaacSim release archive and execute the following command:

./builds/factory_of_the_future.x86_64 --scene Factory01

This will open a Unity3D window and load the Factory of the Future environment, as well as the robot and a set of dolly carts.

Note

This scene requires an RTX 2080 or better GPU to run at an acceptable (~30Hz) framerate.

The scene features four different camera options, which users can switch among with left and right arrow keys:

  • MainCamera: The camera pose is fixed in the scene, unless changed manually:

    1. Scroll to zoom.

    2. Hold right-click and move the mouse to rotate the view.

    3. Hold middle-click and move the mouse to translate the view.

  • GUICamera: Disables external cameras for higher FPS.

  • FollowRobotCamera: Changes position to follow the robot. Unlike the ChaseRobotCamera camera,

    this camera does not rotate with the robot.

  • ChaseRobotCamera: Changes position and rotation to chase the robot.

Application for Cart Delivery

Open a new terminal window and run the following command from within the Isaac SDK folder:

bazel run packages/cart_delivery/apps:cart_delivery

To inspect the state of all running Isaac nodes, open Isaac Sight (http://localhost:3000) in a web browser. For optimal Sight experience, you can disable channels that you do not need to visualize.

The robot will now approach and pick up a dolly cart autonomously. It will drop the dolly cart off at a predefined pose, which can be changed through the configuration.

Behavior Tree for Autonomous Dolly Transport

The application for autonomous dolly transport is based on a behavior tree defining perception and navigation tasks. This application illustrates a delivery scenario that could typically be deployed in a warehouse or factory environment.

The high level sequence of tasks as defined by the application are as follows:

  1. The robot starts from its home position.

  2. The robot moves to the cart pickup point, which is a predefined waypoint.

  3. The robot drives under the cart and lifts it.

  4. The robot drives with the cart to the dropoff point, which is another predefined waypoint.

  5. The robot lowers the cart and drops it off. It drives out from under the cart.

  6. The robot returns to its home/idle position.

../../../_images/behavior_tree_sequence.png

The behavior tree subgraph file that controls the sample application as described above can be found in the Isaac SDK at the following path:

packages/cart_delivery/apps/cart_delivery.subgraph.json

The behavior tree currently carries a single delivery mission. However, the behavior tree can be modified for multiple deliveries.

Interoperation of Navigation and Perception

The output of the perception pipeline represents the estimated 3D pose of the cart. This estimated pose is written into the pose tree as the target pose, which is in turn used by the LQR Planner in the Navigation Stack to drive under the cart. The LQR Planner can also be switched out to use the reinforcement learning model to drive under the cart.

../../../_images/perception_in_navigation_stack.png

Application for Autonomous Navigation only

Running autonomous navigation without a behavior tree can be useful for testing the navigation stack. First, load the Factory of the Future scene from within the IsaacSim release folder:

./builds/factory_of_the_future.x86_64 --scene Factory01

Then run the following command in the Isaac SDK root folder:

bazel run packages/cart_delivery/apps:navigate -- --more apps/assets/maps/virtual_factory_1.json,packages/cart_delivery/apps/navigation.config.json,packages/navsim/robots/str4.json,packages/cart_delivery/apps/pose2_planner.config.json

To navigate to the desired location, follow these steps:

  1. Right-click the Map View window and choose Settings.

  2. In Settings, click the Select marker dropdown menu and choose “pose_as_goal”.

  3. Click the Add marker.

  4. Click Update. The marker will be added to the map. You may need to zoom in on the map to see the new marker. The robot will not immediately begin navigating to the marker.

  5. Click and drag the marker to a new location on the map. See Interactive Markers for more information.

  6. Enable the “disable_deadman_switch” variable in Isaac Sight and click Submit.

The robot will begin to navigate to the marker location.

Application for Perception only

To control the robot manually and test perception algorithms, follow these steps:

  1. Run the following command from within the Isaac SDK root folder:

    bazel run packages/cart_delivery/apps:perception -- --more packages/cart_delivery/apps/detection_pose_estimation.config.json,packages/navsim/robots/str4.json
    
  2. Use a joystick if available. Otherwise, click Virtual Gamepad on the left. Click Connect to Backend on the widget. Select Keypad and use the “wasd” keys to navigate the robot. See Remote Joystick using Sight for more information.

  3. Enable the channels and observe the perception output on Sight.

Trained object detection and pose estimation models allow the robot to infer the 3D pose (3 DoF translation + 3 DoF rotation) of a dolly given only an RGB camera image as input. The detection and pose estimation models work best when the camera is 3-7m away from the cart.

The object detection model outputs a 2D axis-aligned bounding box for a detected dolly. The region of interest for each detection is cropped and given as input to the pose estimation model, which outputs an estimated rotation and translation for each object instance.

Note

This training step is optional. If you skip this step, pre-trained models will be used.

Training the Models

Object detection Model (DetectNetv2)

The object detection model is a DetectNetv2 model based on the ResNet-18 feature extractor. A pre-trained model based on real images was fine tuned on simulated images generated from various scene setups in IsaacSim Unity3D. This model will be automatically downloaded as part of the training app.

Sample scenarios are provided in the form of a Unity3D binary to generate randomized images of the dolly for model training purposes. Scenarios 7, 13, 14, and 15 provide training data for dolly detection. To start the scene with scenario 7, for example, run the following command from the IsaacSim release folder:

./builds/factory_of_the_future.x86_64 --scene Factory01 --scenario 7

This simulation must run alongside the generate_kitti_dataset Isaac application, which generates a ground-truth dataset and saves it offline in KITTI format. To collect the dataset, run the following command from inside the Isaac SDK root folder:

bazel run packages/ml/apps/generate_kitti_dataset

By default, the recorded scenes are stored in /tmp/unity3d_kitti_dataset/ and are separated into training (10000 samples) and testing (100 samples), respectively. When all samples are generated, the Isaac application terminates.

After generating the dataset, the NVIDIA Transfer Learning Toolkit is used to train the model. For more details on the GenerateKittiDataset app and training using the Transfer Learning Toolkit, refer to the Object Detection with DetectNetv2 chapter in the Isaac SDK documentation.

Pose Estimation Model (Pose CNN Decoder)

The pose estimation model architecture encodes the following:

  • The cropped region of interest from the color image

  • The bounding box parameters

Based on these encodings it then estimates the pose as a regression from the concatenated feature space.

The provided pose estimation model was trained from scratch using only images simulated in IsaacSim Unity3D. A sample scene that generates pose estimation training data is provided in the form of a Unity3D binary. This example requires an extensive amount of CUDA memory, and running both the simulation and the Isaac application on the same GPU will possibly result in an out-of-memory error on some platforms. Run the following command from the EA release folder:

./builds/factory_of_the_future.x86_64 --scene Factory01 --scenario 8

To train the model, run this binary alongside the following Isaac application from within the Isaac SDK root folder:

bazel run packages/object_pose_estimation/apps/pose_cnn_decoder/training:pose_estimation_cnn_training

For more details on training the pose estimation model, refer to the 3D Object Pose Estimation with Pose CNN Decoder chapter in the Isaac SDK documentation.

Running the Inference Pipeline

Using the trained pose estimation model, inferences can be made from different sources and for different purposes. The respective result can be observed in Isaac Sight.

Inference on simulation with ground truth bounding box

This example receives images from the simulation and runs the inference pipeline on that image stream. The result can be observed in Isaac Sight.

  1. Start the Factory of the Future simulation from within the EA release folder:

    ./builds/factory_of_the_future.x86_64 --scene Factory01 --scenario 8
    
  2. Run the following command from within the Isaac SDK root folder:

    bazel run packages/object_pose_estimation/apps/pose_cnn_decoder:pose_estimation_cnn_inference_sim
    

Inference on simulation for detection + pose estimation

This example runs the dolly detection and pose estimation on the live camera image from the simulation.

  1. Start the Factory of the Future simulation from within the EA release folder:

    ./builds/factory_of_the_future.x86_64 --scene Factory01 --scenario 8
    
  2. Run the following command from within the Isaac SDK root folder:

    bazel run packages/object_pose_estimation/apps/pose_cnn_decoder:detection_pose_estimation_cnn_inference_sim
    

Inference on sample image

This example runs the inference pipeline on a predefined image.

Run the following command from within the Isaac SDK root folder:

bazel run packages/object_pose_estimation/apps/pose_cnn_decoder:detection_pose_estimation_cnn_inference_imagefeeder

Inference on camera feed

This example runs the inference pipeline on a live camera feed. It requires an Intel RealSense camera to be connected to the host.

Run the following command from within the Isaac SDK root folder:

bazel run packages/object_pose_estimation/apps/pose_cnn_decoder:detection_pose_estimation_cnn_inference_camerafeed

Inference on replay of sample Isaac logs:

This example runs the inference pipeline on prerecorded Isaac logs.

  1. Run the following command from within the Isaac SDK root folder:

    bazel run packages/object_pose_estimation/apps/pose_cnn_decoder:detection_pose_estimation_cnn_inference_replay
    
  2. Open Isaac Sight in a web browser (http://localhost:3000) and activate the Replay Control Panel in the Windows tab on the top left. In the now visible Replay Control Panel, move the time slider far to the left and click START REPLAY. The inference results will be shown on the prerecorded camera feed.

Detection Inference Parameters

You can modify certain parameters to tune the outputs of the models at runtime through Isaac Sight: Each detection has an associated confidence value, and the confidence threshold will filter out all detections with a confidence below the threshold.

To post-process the raw detection outputs from the DetectNetv2 model used for object detection, non-maximum suppression is used to eliminate multiple detections for a single object instance. Decrease the non-maximum suppression threshold to filter out detections that have high intersection-over-union overlap with other detections.

These parameters can be modified permanently in the detection_pose_estimation.config.json file in the Isaac SDK folder:

packages/cart_delivery/apps/detection_pose_estimation.config.json

They can also be changed during runtime in the Isaac Sight Application Configuration tab. They can be found at the following paths:

detection_pose_estimation.object_detection.detection_decoder ->
isaac.detect_net.DetectNetDecoder -> confidence_threshold
                                     non_maximum_suppression_threshold

After you make the changes, confirm them with Submit.

Application for Mapping

To map an environment, follow these steps:

  1. Open a new terminal window and run the following command from within the Isaac SDK folder:

bazel run packages/cart_delivery/apps:gmapping -- --more packages/navsim/robots/str4.json
  1. Open Isaac Sight (http://localhost:3000) in a web browser. For optimal Sight experience, you can disable channels that you do not need to visualize.

  2. Use a joystick if available. Otherwise, click Virtual Gamepad on the left. Click Connect to Backend on the widget. Select Keypad and use the “wasd” keys to navigate the robot. See Remote Joystick using Sight for more information.

  3. Enable the channels and observe the map generated in Sight.

  4. See GMapping Application for more information.