Evaluating the Model
Once the model has been trained, using the experiment config file, and by following the steps to
train a model, the next step would be to evaluate this model on a test set to measure the
accuracy of the model. The TLT toolkit includes the tlt-evaluate
command to do this.
The classification app computes evaluation loss, Top-k accuracy, precision and recall as metrics.
Meanwhile, tlt-evaluate
for DetectNet_v2, FasterRCNN, Retinanet, DSSD, YOLOV3, and SSD
computes the Average Precision per class and the mean Average Precision metrics as defined
in the Pascal VOC challenge. Both sample and integrate modes are supported to calculate
average precision. The former was used in VOC challenges before 2010 while the latter was used
from 2010 onwards. The SAMPLE mode uses an 11-point method to compute the AP, while the INTEGRATE
mode uses a more fine-grained integration method and gets a more accurate number of AP. MaskRCNN
reports COCO’s detection evaluation metrics. AP50 in COCO metrics is comparable to mAP in
Pascal VOC metrics.
When training is complete, the model is stored in the output directory of your choice in
$OUTPUT_DIR. Evaluate a model using the tlt-evaluate
command:
tlt-evaluate {classification,detectnet_v2,faster_rcnn,ssd,dssd,retinanet,yolo, mask_rcnn} [-h] [<arguments for classification/detectnet_v2/faster_rcnn/ssd/dssd/retinanet/yolo, mask_rcnn>]
Required Arguments
{classification, detectnet_v2, faster_rcnn, ssd, dssd, retinanet, yolo,
mask_rcnn}
Choose whether you are evaluating a classification
, detectnet_v2
, ssd
,
dssd
, yolo
, retinanet
, faster_rcnn
, or mask_rcnn model
.
Optional Arguments
These arguments vary depending upon Classification, DetectNet_v2, SSD, DSSD, RetinaNet, YOLOv3, FasterRCNN and MaskRCNN models.
Execute tlt-evaluate
on a classification model.
tlt-evaluate classification [-h] -e <experiment_spec_file> -k <key>
Required Arguments
-e, --experiment_spec_file
: Path to the experiment spec file.-k, –key
: Provide the encryption key to decrypt the model.
Optional Arguments
-h, --help
: Show this help message and exit.
If you followed the example in Training a classification model, you can run the evaluation:
tlt-evaluate classification -e classification_spec.cfg -k $YOUR_KEY
TLT evaluate for classification produces the following metrics:
Loss
Top-K accuracy
Precision (P): TP / (TP + FP)
Recall (R): TP / (TP + FN)
Confusion Matrix
Execute tlt-evaluate
on a DetectNet_v2 model.
tlt-evaluate detectnet_v2 [-h] -e <experiment_spec
-m <model_file>
-k <key>
[--use_training_set]
Required Arguments
-e, --experiment_spec_file
: Experiment spec file to set up the evaluation experiment. This should be the same as training spec file.-m, --model
: Path to the model file to use for evaluation. This could be a.tlt
model file or a tensorrt engine generated using the tlt-export tool.-k, -–key
: Provide the encryption key to decrypt the model. This is a required argument only with a.tlt
model file.
Optional Arguments
-h, --help
: show this help message and exit.-f, --framework
: the framework to use when running evaluation (choices: “tlt”, “tensorrt”). By default the framework is set to TensorRT.--use_training_set
: Set this flag to run evaluation on training + validation dataset.
If you have followed the example in Training a Detection Model, you may now evaluate the model using the following command:
tlt-evaluate detectnet_v2 -e <path to training spec file>
-m <path to the model>
-k <key to load the model>
This command runs evaluation on the same validation set that was used during training.
Use these steps to evaluate on a test set with ground truth labeled:
Create tfrecords for this training set by following the steps listed in the data input section.
Update the dataloader configuration part of the training spec file to include the newly generated tfrecords. For more information on the dataset config, please refer to Create an experiment spec file. You may create the tfrecords with any partition mode (sequence/random). The evaluate tool iterates through all the folds in the tfrecords patterns mentioned in the validation_data_source.
dataset_config {
data_sources: {
tfrecords_path: "<path to training tfrecords root>/<tfrecords_name*>"
image_directory_path: "<path to training data root>"
}
image_extension: "jpg"
target_class_mapping {
key: "car"
value: "car"
}
target_class_mapping {
key: "automobile"
value: "car"
}
..
..
..
target_class_mapping {
key: "person"
value: "pedestrian"
}
target_class_mapping {
key: "rider"
value: "cyclist"
}
validation_data_source: {
tfrecords_path: "<path to testing tfrecords root>/<tfrecords_name*>"
image_directory_path: "<path to testing data root>"
}
}
The rest of the experiment spec file remains the same as the training spec file.
To run evaluation for a faster_rcnn model use this command:
tlt-evaluate faster_rcnn [-h] -e <experiment_spec>
[-k <enc_key>]
Required Arguments
-e, --experiment_spec_file
: Experiment spec file to set up the evaluation experiment. This should be the same as a training spec file.
Optional Arguments
-h, --help
: show this help message and exit.-k, --enc_key
:The encoding key, can override the one in the spec file.
Evaluation Metrics
For FasterRCNN, the evaluation will print out 4 metrics for the evaluated model:
AP(average precision), precision, recall and RPN_recall for each class in the evaluation dataset.
inally, it will also print the mAP(mean average precision) as a single metric number. Two
modes are supported for computing the AP, i.e., the PASCAL VOC 2007 and 2012 metrics. This
can be configured in the spec file’s evaluation_config.use_voc_11_point_metric
parameter.
If this parameter is set to True, then AP calculation will use VOC 2007 method, otherwise it will
use VOC 2012 method.
The RPN_recall metric indicates the recall capability of the RPN of the FasterRCNN model. The higher the RPN_recall metric, it means RPN can better detect an object as foreground(but it doesn’t say anything on which class this object belongs to since that is delegated to RCNN). The RPN_recall metric is mainly used for debugging on the accuracy issue of a FasterRCNN model.
Two Modes for tlt-evaluate
The tlt-evaluate
command line for FasterRCNN has two modes. It can run with either
TLT backend or TensorRT backend. This behavior is also controlled via the spec file. The
evaluation_config
in the spec file can have an optional trt_evaluation
sub-field
that specifies which backend the tlt-evaluate
will run with.
By default (if the trt_evaluation
sub-field is not present in evaluation_config)
,
tlt-evaluate
will use TLT as the backend. If the trt_evaluation
sub-field
is present, it can specify tlt-evaluate
to run at TensorRT backend. In that case,
the model to do inference can be either the .etlt model from tlt-export
or the
TensorRT engine file from tlt-export
or tlt-converter
.
To use a TensorRT engine file for TensorRT backend based tlt-evaluate
, the
trt_evaluation
sub-field should look like this:
trt_evaluation {
trt_engine: '/workspace/tlt-experiments/data/faster_rcnn/trt.int8.engine'
max_workspace_size_MB: 2000
}
To use a .etlt
model for TensorRT backend based tlt-evaluate
, the
trt_evaluation
sub-field should look like this:
trt_evaluation {
etlt_model {
model: '/workspace/tlt-experiments/data/faster_rcnn/resnet18.epoch12.etlt'
calibration_cache: '/workspace/tlt-experiments/data/faster_rcnn/cal.bin'
}
trt_data_type: 'int8'
max_workspace_size_MB: 2000
}
If the TensorRT inference data type is not INT8, the calibration_cache
sub-field that
provides the path to the INT8 calibration cache is not needed. In INT8 case, the calibration
cache should be generated via the tlt-export command line in INT8 mode. See also the
documentation of FasterRCNN spec file for the details of the trt_evaluation
message
structure.
To run evaluation for an SSD model use this command:
tlt-evaluate ssd [-h] -e <experiment_spec_file> -m <model_file> -k <key>
Required Arguments
-e, --experiment_spec_fil
: Experiment spec file to set up the evaluation experiment. This should be the same as the training specification file.-m, --model
: Path to the model file to use for evaluation.-k, --key
: Provide the key to load the model.
Optional Arguments
-h, --help
: show this help message and exit.
To run evaluation for an DSSD model use this command:
tlt-evaluate ssd [-h] -e <experiment_spec_file> -m <model_file> -k <key>
Required Arguments
-e, --experiment_spec_file
: Experiment spec file to set up the evaluation experiment. This should be the same as training spec file.-m, --model
: Path to the model file to use for evaluation.-k, --key
: Provide the key to load the model.
Optional Arguments
-h, --help
: Show this help message and exit.
To run evaluation for a YOLOv3 model use this command:
tlt-evaluate yolo [-h] -e <experiment_spec_file> -m <model_file> -k <key>
Required Arguments
-e, --experiment_spec_file
: Experiment spec file to set up the evaluation experiment. This should be the same as the training specification file.-m, --model
: Path to the model file to use for evaluation.-k, --key
: Provide the key to load the model.
Optional Arguments
-h, --help
: show this help message and exit.
To run evaluation for a RetinaNet model use this command:
tlt-evaluate retinanet [-h] -e <experiment_spec_file> -m <model_file> -k <key>
Required Arguments
-e, --experiment_spec_file
: Experiment spec file to set up the evaluation experiment. This should be the same as the training specification file.-m, --model
: Path to the model file to use for evaluation.-k, --key
: Provide the key to load the model.
Optional Arguments
-h, --help
: Show this help message and exit.
To run evaluation for a MaskRCNN model use this command:
tlt-evaluate mask_rcnn [-h] -e <experiment_spec_file> -m <model_file> -k <key>
Required Arguments
-e, --experiment_spec_file
: Experiment spec file to set up the evaluation experiment. This should be the same as the training spec file.-m, --model
: Path to the model file to use for evaluation.-k, --key
: Provide the key to load the model. This argument is not required if-m
is followed by a TensorRT engine.
Optional Arguments
-h, --help
: show this help message and exit.