The object detection workflow in the Isaac SDK uses the NVIDIA object detection DNN architecture, DetectNetv2. It is available on NVIDIA NGC and is trained on a real image dataset. Tools integrated with the Isaac SDK enable you to generate your own synthetic training dataset and fine-tune the DNN with the Transfer Learning Toolkit (TLT). The fine-tuned DetectNetv2 can then be used for inference in your robotics applications.


The following sections explain how to:
- Generate a KITTI dataset from Isaac Sim.
- Fine-tune a pre-trained DetectNetv2 model on the generated dataset.
- Run inference on various inputs using the Isaac TensorRT Inference codelet.

Training a DetectNetv2 model involves generating simulated data and using TLT to train a model on this data. Isaac SDK provides sample models that are used in applications, for example the Cart Delivery application. The following step-by-step instructions walk through the process of how one such model was trained for the Industrial Dolly (the figure below shows training samples from the Factory of the Future environment). Use these steps as guidelines to train models on your own objects.

This section describes how to 1) run the simulation to generate data, 2) run the Isaac application alongside the simulator to capture the data and save it to a dataset, and 3) verify the generated dataset by visual inspection.
Generating Simulated Data with Unity
Generating data for sample objects from scene binary: Industrial Dolly and Industrial Box
A sample Factory of the Future scene binary that generates the above data for two
objects, industrial dolly and industrial box, is available in the isaac_sim_unity3d
repository in
builds
. The Cart Delivery application and the
shuffle box applications use models trained on data from this scene.
A subset of the scenarios in the scene are data generation scenes for object detection. Scenarios 7, 13, 14, and 15 provide training data for industrial dolly detection. Scenario 9 provides training data for industrial box detection. To start the scene with scenario 7, for example, run the following command from the Isaac Sim release folder:
./builds/factory_of_the_future.x86_64 --scene Factory01 --scenario 7
Generating data for custom objects from scene source file
To generate data for custom objects, a sample scene is provided in in the isaac_sim_unity3d
repository in
packages/Nvidia/Samples/ObjectDetection/
. This sample scene can generate data with
randomized backgrounds, occluding objects, lighting conditions, and camera poses.
Objects are spawned in the procedural > objects GameObject. The list of objects for training is in
packages/Nvidia/Samples/ObjectDetection/ObjectDetectionAssetGroup
. By default, this AssetGroup contains the dolly prefab. Modify the list of GameObjects to match the list of objects you wish to train on by increasing the size of theObjectDetectionAssetGroup
and dragging each new prefab into this list. Each prefab in this list should contain a LabelSetter component that contains the name of the object.If you would like each label from the prefab to be associated with the same instance, add a InstanceLabelGroup to the prefab as well. For example, if each wheel in the dolly prefab has the “wheel” label, an InstanceLabelGroup component in the game object containing all the wheels would result in one bounding box containing all wheels, instead of four separate boxes, one per wheel.
Modify the MaxCount and MaxTrials parameters in the procedural > objects > Collider Asset Spawner component to reflect the number of objects to spawn each frame. The maxCount parameter specifies the number of objects to spawn. The maxPickTrials and maxPlaceTrials values denote how many times each object should be placed again if the initial spawning location is invalid. Additionally, the Dropout parameter under procedural > objects > Collider Asset Spawner represents the probability of an asset being “dropped out” of the frame (the default value is 0.2). Increasing this value results in a dataset with more negative samples, which should be present in the dataset to minimize false positives during inference.
Modify the ClassLabelManager game object in the scene. By default, it contains one class label rule (dolly) and two class labels (one for background, and one for dolly). Modify this such that there is one class label rule and one class label per object in your ObjectDetectionAssetGroup. Set the “name” and “expression” fields to the label of the object–this should match the string that was set as the label in LabelSetter in step (c). Make sure that the rule index of each object class label is the same as its class label index (for example, the dolly uses index 1 by default). The index value is used as the value to set the pixels in the label image that is later used to generate bounding boxes. Leave the “Default Label” field to 0, as it is the value used to populate all the pixels that are not associated with objects (background pixels).
Running the Isaac Application to Generate a KITTI Dataset
Configure parameters for the dataset in packages/ml/apps/generate_kitti_dataset/generate_kitti_dataset.app.json
.
Here the config can be modified to vary, among other parameters, the dataset output location,
the output resolution of the images (for best results, use dimensions that are multiples of 16),
number of training images, and number of testing images to create. The default application generates a dataset of
10k training images and 100 testing images; all images are in PNG format, with a resolution of
640x368.
Run the following application alongside a simulation to generate a dataset for input to the TLT training pipeline:
bob@desktop:~/isaac/sdk$ bazel run packages/ml/apps/generate_kitti_dataset
On completion, the application creates a
directory (/tmp/unity3d_kitti_dataset
by default) with the following structure:
unity3d_kitti_dataset/
training/
image_2/ [training images]
000001.png
000002.png
...
label_2/ [training labels in kitti format]
000001.txt
000002.txt
...
testing
image_2/ [testing images]
000001.png
000002.png
...
Verifying the KITTI Dataset
To verify the integrity of the dataset, a tool is provided to visually analyze the images and bounding boxes in the dataset.
The following application takes in a path to a generated dataset, and selects N images to verify.
For each image, a cropped image is saved out to a folder in the dataset under the name verify
.
bob@desktop:~/isaac/sdk$ bazel run packages/ml/apps/generate_kitti_dataset:verify_kitti_dataset -- --dataset_path <your_dataset_path> --num_images <number_of_images_to_verify>
Create a local directory called
tlt-experiments
to mount in the docker container. Move theunity3d_kitti_dataset
directory into this directory.Follow these instructions from IVA to set up docker and NGC.
Start a docker container and mount the directory with the following command. With Isaac 2020.2, the v2.0_dp_py2 container is supported, and includes all the necessary files to train a DetectNetv2 model.
docker run --runtime=nvidia -it -v <path_to_tlt-experiments>:/workspace/tlt-experiments -p 8888:8888 nvcr.io/nvidia/tlt-streamanalytics:v2.0_dp_py2
Navigate to the
/workspace/examples/detectnet_v2/
directory in the docker image.Copy the
/workspace/examples/detectnet_v2/specs
folder into yourworkspace/tlt-experiments
folder. We will later modify these specs in the mounted folder so that the training specs persist after the docker container is terminated.Start a Jupyter notebook server as described in the TLT documentation:
jupyter notebook --ip 0.0.0.0 --allow-root
Open the detectnet_v2.ipynb notebook and follow the instructions, taking into account these special instructions for each step.
Set up env variables:
-
$KEY
: Create a “key”, which is used to protect trained models and must be known at inference time to access model weights. -
$USER_EXPERIMENT_DIR
: Leave this set to/workspace/tlt-experiments
. -
$DATA_DOWNLOAD_DIR
: Set this to the path of yourunity3d_kitti_dataset
. -
$SPECS_DIR
: Set this to the path of the copied specs directory within the mounted folder from step #6.
-
Verify the downloaded dataset. Skip the first two cells, which download a KITTI object detection dataset into the
$DATA_DOWNLOAD_DIR
specified above. The simulated dataset from Unity3D should already be at this path, so run the last two cells of this section to validate your simulated dataset.Prepare tf records from the KITTI format dataset. Modify the
$SPECS_DIR/detectnet_v2_tfrecords_kitti_trainval.txt
file to reflect the correct dataset path. An example is provided below for training dolly detection.kitti_config { root_directory_path: "/workspace/tlt-experiments/unity3d_kitti_dataset/training" image_dir_name: "image_2" label_dir_name: "label_2" image_extension: ".png" partition_mode: "random" num_partitions: 2 val_split: 14 num_shards: 10 } image_directory_path: "/workspace/tlt-experiments/unity3d_kitti_dataset/training"
Then run the cells as instructed in the notebook. The cell containing the
tlt-dataset-convert
outputs a message regarding the classmap such as the one below. Note the “label in tfrecords” file. This value is used as thekey
when writing the training configuration in step (e).2020-05-09 01:30:12,694 - iva.detectnet_v2.dataio.dataset_converter_lib - INFO - Class map. Label in GT: Label in tfrecords file dolly: dolly For the dataset_config in the experiment_spec, please use labels in the tfrecords file, while writing the classmap.
Download the pre-trained model: Run the cells as instructed in the notebook.
Modify the training parameters for object classes in
$SPECS_DIR/detectnet_v2_train_resnet18_kitti.txt
for your use case:
First, change the
dataset_config
>data_sources
>image_directory_path
andtfrecords_path
to the training folder inside your generated dataset:dataset_config { data_sources { tfrecords_path: "/workspace/tlt-experiments/unity3d_kitti_dataset/tfrecords/kitti_trainval/*" image_directory_path: "/workspace/tlt-experiments/unity3d_kitti_dataset/training" }
Update the list of
target_class_mapping
parameters, adding one for each object class. For each object, thekey
field of this struct must exactly match the corresponding “label in tfrecords file” from step 9c.target_class_mapping { key: "dolly" value: "dolly" }
Edit the
output_image_width
andoutput_image_height
parameters underaugmentation_config
>preprocessing
.preprocessing { output_image_width: 640 output_image_height: 368 ... }
Under the
postprocessing_config
header, make sure there is onetarget_class_config
configuration per object class. Leave theclustering_config
set to default values.target_class_config { key: "dolly" value { clustering_config { ... } }
Use the default values for the
model_config
section.Modify the
evaluation_config
section. Edit thevalidation_period_during_training
parameter to change the number of epochs between validation steps. Make sure there is oneminimum_detection_ground_truth_overlap
and oneevaluation_box_config
struct for each object class, using the default values within the struct:evaluation_config { validation_period_during_training: 10 first_validation_epoch: 1 minimum_detection_ground_truth_overlap { key: "dolly" value: 0.5 } evaluation_box_config { key: "dolly" value { ... } ... }
In
cost_function_config
, make sure that there is one target_classes struct per object class, using the default values within the struct.NoteThe
cost_function_config
section contains parameters for setting weights per class for calculation of the loss or cost.
Modify the
training_config
section. In this example, the images are 640x368, so thebatch_size_per_gpu
can be increased to 16 for faster learning, thus allowing for reduction of thenum_epochs
to 100. Use the default values for thelearning_rate
,regularizer
,optimizer
, andcost_scaling
parameters, keeping in mind that these can be adjusted if needed. By default, the training outputs a model checkpoint every 10 epochs; modify thecheckpoint_interval
parameter to change this frequency.Modify the
bbox_rasterizer_config
section to have onetarget_class_config
per object class. For the dolly object, these values were used:bbox_rasterizer_config { target_class_config { key: "dolly" value: { cov_center_x: 0.5 cov_center_y: 0.5 cov_radius_x: 0.4 cov_radius_y: 0.4 bbox_min_radius: 1.0 } } ... }
For more guidance on these training parameters, see the TLT documentation and this blog post.
- Run TLT training using the
tlt-train
command, as shown in the notebook. - Evaluate the trained model. Run the
tlt-evaluate
command as shown in the notebook to evaluate the final trained model. You can also evaluate any of the checkpoint models using the-m
flag with the path of themodel.step-xxx.tlt
files. - Prune the trained model to reduce the number of parameters, thus decreasing inference runtimes and the overall size of the model. To prune, run the
tlt-prune
command as shown in the notebook. Read the pruning instructions and adjust the pruning threshold accordingly. Apth
value of 0.01 is a good starting point for detectnet_v2 models. NVIDIA recommends a pruning ratio between 0.1 and 0.3. - Retrain the pruned model by modifying the
$SPECS_DIR/detectnet_v2_retrain_resnet18_kitti.txt
file, similar to$SPECS_DIR/detectnet_v2_train_resnet18_kitti.txt
. Update themodel_config
so that theload_graph
option is set totrue
. Make sure to also set the correct path to the pruned model from the previous step in thepretrained_model_file
parameter undermodel_config
. - Evaluate the retrained model. Run the
tlt-evaluate
command as shown in the notebook to evaluate the final trained model. You can also evaluate any of the checkpoint models using the-m
flag with the path of themodel.step-xxx.tlt
files. - Edit the
$SPECS_DIR/detectnet_v2_inference_kitti_tlt.txt
file to set inference parameters. In theinferencer_config
, set the target classes and inference dimensions accordingly, and provide the correct path to the model to be used for inference. In thebbox_handler_config
, makes sure there is oneclasswise_bbox_handler_config
per class with the appropriate key in addition to the defaultclasswise_bbox_handler_config
.
- Visualize inferences using the
tlt-infer
command as shown in the notebook. Update the-i
flag to the testing directory of the simulated dataset and the-m
flag to the path to the retrained model. - After the model is trained, pruned, and evaluated to your satisfaction, export it using the
tlt-export
command under the “Deploy!” section of the notebook. This provides you with a file of .etlt format, which you can then use for inference with Isaac.
A sample DetectNetv2 model that was trained using the above workflow is provided. This model
was trained on a different dolly than the one shown above, but with the same configuration.
In addition, a sample inference application is provided in packages/detect_net/apps
,
utilizing the detect_net_inference
subgraph located in the same folder. With this
app, you can do the following:
- Run inference on a set of real images:
bob@desktop:~/isaac/sdk$ bazel run packages/detect_net/apps:detect_net_inference_app -- --mode image --rows 480 --cols 848
- Run inference on a recorded Isaac log:
bob@desktop:~/isaac/sdk$ bazel run packages/detect_net/apps:detect_net_inference_app -- --mode cask --rows 480 --cols 848
- Run inference on an image stream from Isaac Sim Unity3D:
bob@desktop:~/isaac/sdk$ bazel run packages/detect_net/apps:detect_net_inference_app -- --mode sim
- Run inference on a camera feed from an Intel Realsense camera:
bob@desktop:~/isaac/sdk$ bazel run packages/detect_net/apps:detect_net_inference_app -- --mode realsense
- Run inference on a camera feed from a V4L camera (be sure to adjust the framerate and resolution according to your camera):
bob@desktop:~/isaac/sdk$ bazel run packages/detect_net/apps:detect_net_inference_app -- --mode v4l --fps 30 --rows 448 --cols 800
Another inference application
/packages/object_pose_estimation/detect_net/apps/detect_net_inference_deploy_app.json
is provided without including the sample log data for industrial dolly and box that helps in faster
deployment of the package. So, it is recommended to use this application instead of
detect_net_inference_app.json
in case of package deployment.
To use the app, replace detect_net_inference_app
in the commands listed
above with detect_net_inference_deploy_app
to run inference in different modes.
The app includes sample images for industrial dolly and box to test the inference in image mode.
When performing inference on the sample model, the resolution of input images must be greater than or equal to 640x368. The inference application uses the ColorCameraEncoder codelet to downscale input images to match the network input resolution, which is 640x368 for the provided sample dolly detection network. However, ColorCameraEncoder does not support upscaling, so images that are input to the inference applications cannot have a smaller resolution than the network input resolution in either dimension.
Inference on custom models
These applications can be configured to run inference on your own trained models. The above sample applications
use the configuration provided in sdk/packages/detect_net/apps/detect_net_industrial_dolly.config.json
.
Create a similar configuration file for your model with the appropriate ETLT model path and password. By default,
inference supports a single-object model trained on a 640x368 input resolution. Pass this new config to the application
using the --config
command line parameter.
Note that if the number of objects
or resolution of images is anything other than this, the input tensor info under “detect_net_inference.tensor_encoder”
must be updated in the detect_net_inference
subgraph.
Detection Inference Parameters
During inference, there is a set of parameters that dictates the post processing of the raw detections that are output by the neural network. Specifically note the the following parameters of the detection decoder. The same configuration file that holds the model path and password has default values for these, and should be tuned for each new trained model based on the inference settings.
confidence_threshold
:
Each detection has an associated confidence value, and the confidence threshold filters out all detections with a confidence below the threshold. non_maximum_suppression_threshold
:
To post-process the raw detection outputs from the DetectNetv2 model used for object detection, non-maximum suppression is used to eliminate multiple detections for a single object instance. Decrease the non-maximum suppression threshold to filter out detections that have high intersection-over-union overlap with other detections.
This sample was trained on a limited dataset and is not guaranteed to work in every situation and lighting condition. To improve model accuracy in a custom environment, you can train your own model using the instructions provided above.
The object pose estimation pipeline is one of the many use cases for DetectNet. For more information about the pose estimation pipeline, refer to the 3D Object Pose Estimation with Pose CNN Decoder documentation.
Evaluation of a model can help improve the model in several ways:
- Data validation: A model is only as good as the data it was trained on. There are many aspects to a training dataset that can affect performance: data integrity, class balance/imbalance, etc.
- Model improvement: Developers may wish to make incremental changes to model architectures, hyperparameters, etc. in order to explore their effects on performance.
One of the most common metrics used to evaluate object detection models is Average Precision (AP), which is calculated as follows: \((true positives) / (true positives + false positives)\). Average precision (AP) is the precision averaged over image frames. Average recall (AR) is also an important measure, where recall is \((true positives) / (true positives + false negatives)\). Precision quantifies how well each prediction made by the network matches a ground truth object, while recall captures how many ground truth objects are identified by the network.
The basic values needed to calculate the above metrics are the true positive (TP), false positive (FP), and false negative (FN) scores. In other words, we need to build a confusion matrix for the inference results. To determine if a prediction and a ground truth bounding box match well enough to consider it a true positive, we use the IOU (Intersection over Union) threshold. IOU is a measure of how much two bounding boxes overlap (0 being no overlap, and 1 being an exact match). Setting a lower IOU threshold corresponds to higher tolerance for bounding box errors. We define true positives as the bounding box pairs for which the IOU score is greater than the IOU threshold. The following image shows the ground truth box in black and the predicted bounding box in green for a sample image.

The detection evaluation pipeline in Isaac SDK is set up to ingest data in Isaac log/cask format to ensure that the pipeline is agnostic to the source of evaluation data, i.e., simulation or real data. The evaluation workflow provided below walks through the process of collecting evaluation raw data and associated ground truth, collecting prediction data, and evaluating the metrics defined above by comparing ground truth against predictions.
Evaluation Data Collection
In order to evaluate a model, a set of single image data samples and associated ground truth are required in Isaac log/cask format. The evaluation pipeline requires one log/cask containing image data, and one containing associated ground truth data (where the timestamp associates the ground truth message with an image frame). The image data cask must contain ImageProto messages and the ground truth data cask must contain Detections2Proto messages. This data can be either real or simulated.
From simulation
Simulation provides easy access to an unlimited amount of labeled data for evaluation. The evaluation pipeline is set up to consume data in Isaac log/cask format, so data samples from simulation along with the ground truth pose are recorded in cask format using an Isaac SDK application:
bob@desktop:~/isaac/sdk$ bazel run packages/ml/apps/record_sim_ground_truth:record_sim_ground_truth -- --mode bounding_box
This command connects to Isaac Sim and collects RGB image samples along with ground truth bounding boxes of the dolly.
By default, the image data cask is saved to /tmp/data/raw
and contains one channel (‘color’). The
ground truth data cask is saved to /tmp/data/ground_truth
and contains one channel (‘bounding_boxes’). Note
the following important arguments to record detection data:
-
--mode
: This should be set tobounding_box
for detection data collection. -
--image_channel
: The name of the image channel sending the RGB data in simulation. Defaults to “color”. -
--intrinsics_channel
: The name of the image instrinsics channel containing the pinhole model of the camera. Defaults to “color_intrinsics”. -
--segmentation_channel
: The base name of the segmentation channels containing the class, labels, and instance segmentation information from the camera. Defaults to “segmentation”. -
--runtime
: Run time of the simulation to collect the data samples. The default is set to 20 seconds. Increase this value to collect more evaluation data samples.
Each run of the application saves one image and ground truth data cask pair. This way, multiple sets of data can be collected for evaluation in the form of multiple logs.
This recording application must be run alongside a simulation, similar to the generate_kitti_dataset
described
above. To collect evaluation data for a custom object, follow the steps listed in the above section titled
“Generating data for custom objects from scene source file”.
For dolly detection evaluation, two evaluation scenes are provided in the Factory of the Future scene.
Scenario 17 spawns a cart in front of the camera at various angles and positions between 1.5 and 2.5 meters from the camera. To run Scenario 17, use the following command from within the Isaac Sim release folder:
./builds/factory_of_the_future.x86_64 --scene Factory01 --scenario 17
Scenario 18 spawns multiple carts along the robot’s path as it drives along the factory floor. To run Scenario 18, use the following command from within the Isaac Sim release folder:
./builds/factory_of_the_future.x86_64 --scene Factory01 --scenario 18
From real data
To evaluate on real data, collect an image data cask using the Isaac record component.
One way to collect appropriate ground truth data for this real data is to use the CVAT tool. CVAT XML data can then be converted to a ground truth cask for the evaluation pipeline. An example application to perform this conversion is provided.
The following are the important input arguments needed for this script:
-
--cvat_xml
: Path to the input CVAT XML file. This argument is required. -
--slice_mode
: The slicing mode to determine which detections to extract. By default this is set toall
, meaning that all the detections found in the XML are extracted. One other slicing mode is available (dolly
), which can be used to slice out dolly detections from the sample CVAT file described below. -
--base_directory_gt
: The directory to save the generated ground truth detections cask. By default this is/tmp/data/ground_truth
. -
--raci_metadata
: Saves JSON metadata along with cask to <app_uuid>_md.json. This is required to use the following steps in the evaluation pipeline. -
--no_raci_metadata
: No metadata saved.
To run the application on a sample CVAT file with ground truth dolly detections, run the following command:
bob@desktop:~/isaac/sdk$ bazel run packages/detect_net/evaluation:cvat_to_cask -- --cvat_xml external/detect_net_cvat_sample_data/data/cvat/0c2d809a-38cd-11ea-8bb7-79860d087101.labels.cvat.images.xml --slice_mode dolly
This application creates a cask and saves it to /tmp/data/ground_truth
. The associated image cask
is a single Isaac log located in the following location:
isaac/sdk/bazel-sdk/external/detect_net_cvat_sample_data/data/raw/0c2d809a-38cd-11ea-8bb7-79860d087101
.
To keep consistent with the workspace organization in tmp/data
, this cask directory should be moved to the
/tmp/data/raw
directory.
Please note that this application serves as an example of data ingestion from the CVAT XML format.
The sample CVAT file contains many detections of various classes, and the application slices out
dolly detections to save to the cask. To write a custom slice mode, modify the slice_detections
function for your use case.
Collecting prediction data: Inference Recording
In this step, the application replays all the image cask files in a given input directory, runs the detection inference application and records the inferred detections as a cask log for each input image cask file.
The following are the important input arguments needed for this script:
-
--inference_app
: The path to the application file that replays a log, performs inference, and records results. By default this is the detection inference application. -
--config
: The config file to load for inference parameters for the above inference app. By default this is the dolly inference configuration used in the above sample inference apps. -
--raci_metadata
: Saves JSON metadata along with cask to <app_uuid>_md.json. This is required to use the following steps in the evaluation pipeline. -
--no_raci_metadata
: No metadata saved. -
--input_cask_workspace
: The workspace containing the input cask files. Input image logs must be in thedata/raw
directory inside this workspace. -
--output_cask_workspace
: The output cask files are written indata/<output_directory_name>
inside this workspace. If this parameter is not set, it is assumed to be same as theinput_cask_workspace
. -
--output_directory_name
: Base directory name to which to write the predictions cask output. Cask files are created in<output_cask_workspace>/data/<output_directory_name>
. The default is set to “predictions”.
Assuming that the image casks are stored in /tmp/data/raw
, the predicted detections
can be generated by running the following command from the Isaac SDK directory:
bob@desktop:~/isaac/sdk$ bazel run packages/ml/apps/evaluation_inference:evaluation_inference -- --input_cask_workspace /tmp
With the above command, the output prediction detection casks are stored in the path
/tmp/data/predictions
. The output detection casks are named with the same name as
input image casks, with an additional tag so that the image cask and the corresponding pose
casks can be associated in the next step for model evaluation.
Object Detection Cask Evaluation
Once the image, ground truth, and prediction data are collected, the evaluation metrics can be computed. This step reads
the full list of casks in the image cask directory and their corresponding ground truth data and predictions, computes
framewise metrics, and aggregates the metrics. The configuration file located in
packages/detect_net/evaluation/detect_net_cask_evaluation.config.json
is used to set the evaluation parameters
including the IOU thresholds, outlier thresholds, and KPI thresholds. The default values in this file are for the dolly model.
The following are input arguments needed for this script:
-
config
: The path to the config file for evaluation. The default is set topackages/detect_net/evaluation/detect_net_cask_evaluation.config.json
. -
--image_cask_dir
: The path to the image cask directory. Only the image logs must be placed in this directory. The data is aggregated over all the logs in this directory. The default path is set to/tmp/data/raw
. -
--gt_cask_dir
: The path to the ground truth pose cask directory corresponding to the image casks in theimage_cask_dir
. The default path is set to/tmp/data/ground_truth
. -
--predicted_cask_dir
: The path to the predicted pose cask directory corresponding to the image casks in theimage_cask_dir
. It is expected to contain 2D detections as well ifuse_2d_detections
is set totrue
. The default path is set to/tmp/data/predictions
. -
--results_dir
: The path to store the evaluation results. The directory is created if not already present. The default path is set to/tmp/data/results
. -
--save_outliers [true/false]
: If true, saves the outliers to the disk under the results directory. By default, this value is false as it may take some time to save each frame to disk if there are many outliers.
To run the application, use the following command:
bob@desktop:~/isaac/sdk$ bazel run packages/detect_net/evaluation:detect_net_cask_evaluation
The evaluation results are stored as a JSON in the specified results directory. Under the “results” tag, you find
the number of frames that were evaluated, the precisions/recalls/and confusion matrices per IOU, and a list of the
outlier indices. The outliers are determined for a certain IOU threshold specified in the config file by
outlier_iou_area_threshold.
The three types of outliers that are extracted are images with false positives, false negatives,
and large bounding box errors (the threshold for “large” is a parameter in the config file large_bbox_iou_area_min
).
Finally, the mAP and mAR values across all the evaluated frames are computed and output at the end of the results section.
The KPI_mAP
and KPI_mAR
values are computed over all IOUs over all the classes, as done in the COCO 2017
challenge. The KPI_mAP_lowest_IOU
and KPI_mAR_lowest_IOU
values are computed over a single IOU (specifically,
the lowest one - 0.5 by default) over all the classes, as done in the PASCAL VOC2007 challenge. The final KPI_pass
output
is true if all of the four KPI values meet the thresholds specified in the config file.
This evaluation pipeline is provided as a tool to aid in improving model performance. If evaluation results do not meet standards, consider modifying the training data to better reflect the distribution of the data used for evaluation. The outlier results can aid in error analysis: Investigate the failure cases and outliers on which the model performs poorly, and use these cases to inspire the training scene in simulation. Also, consider re-training with data from multiple simulation environments (scenes) and then adjusting the amount of training data per scene based on the outlier analysis.