Model Fine Tuning#
The Smart City Blueprint uses two classes of models: computer vision models for object detection based on RT-DETR and Mask Grounding DINO, and vision language models (VLMs) based on NVIDIA CR2 model. Usage of these models is described in the Blueprint Deep Dive section Blueprint Deep Dive.
Users can fine tune these models to further specialize their object detection accuracy for specific object classes (for CV models). This section describes procedure for fine tuning the CV model; fine tuning recipe for the CR2 model is forthcoming.
CV model fine tuning#
Fine-tuning TrafficCamNet Transformer Lite with TAO RT-DETR#
The TrafficCamNet Transformer Lite model is based on the RT-DETR (Real-Time DEtection TRansformer) architecture and can be fine-tuned using NVIDIA TAO Toolkit to improve detection accuracy for specific traffic scenarios or new object classes.
Prerequisites#
NVIDIA TAO Toolkit: Install TAO Toolkit following the official documentation.
Dataset: Prepare your dataset in COCO format with:
Directory of training/validation images
Annotation files in COCO JSON format
Class mapping file (
labels.txt)
NGC CLI: Install NGC CLI following the NGC CLI documentation.
Base Model: Download the TrafficCamNet Transformer Lite pretrained model from the TAO Model Zoo using the NGC CLI.
ngc registry model download-version nvidia/trafficcamnet_transformer_lite:trainable_resnet50_v2.0 --destination /path/to/model
Dataset Preparation#
Organize your dataset in the following structure:
dataset/
├── train/
│ ├── images/
│ └── annotations.json
├── val/
│ ├── images/
│ └── annotations.json
└── labels.txt
The annotations should follow COCO format with bounding boxes for objects like vehicles, pedestrians, and cyclists.
Creating the Experiment Spec File#
Create a YAML specification file (e.g., rtdetr_trafficcam_finetune.yaml) with the following configuration:
dataset:
train_data_sources:
- image_dir: /path/to/dataset/train/images
json_file: /path/to/dataset/train/annotations.json
val_data_sources:
image_dir: /path/to/dataset/val/images
json_file: /path/to/dataset/val/annotations.json
batch_size: 8
workers: 4
num_classes: 4 # e.g., car, person, bicycle, truck
augmentation:
multi_scales:
- - 480
- 832
- - 512
- 896
- - 544
- 960
- - 544
- 960
- - 544
- 960
- - 576
- 992
- - 608
- 1056
- - 672
- 1184
- - 704
- 1216
- - 736
- 1280
- - 768
- 1344
- - 800
- 1408
train_spatial_size: [544, 960]
eval_spatial_size: [544, 960]
distortion_prob: 0.8
iou_crop_prob: 0.8
model:
backbone: resnet_50
train_backbone: true
pretrained_backbone_path: /path/to/trafficcamnet_transformer_lite.pth
num_queries: 300
dec_layers: 6
enc_layers: 1
train:
optim:
lr: 0.0001
lr_backbone: 0.00001
lr_scheduler: MultiStep
lr_steps: [30, 45]
lr_decay: 0.1
optimizer: AdamW
weight_decay: 0.0001
num_epochs: 50
checkpoint_interval: 5
validation_interval: 5
num_gpus: 1
gpu_ids: [0]
pretrained_model_path: /path/to/trafficcamnet_transformer_lite.pth
Running Fine-Tuning#
Execute the training command:
tao model rtdetr train -e rtdetr_trafficcam_finetune.yaml \
results_dir=/results/trafficcam_finetuned
The training process will:
Load the pretrained TrafficCamNet Transformer Lite weights
Fine-tune on your custom dataset
Save checkpoints every 5 epochs
Run validation every 5 epochs
Resuming Training#
If training is interrupted, resume from a checkpoint:
tao model rtdetr train -e rtdetr_trafficcam_finetune.yaml \
train.resume_training_checkpoint_path=/results/trafficcam_finetuned/model_epoch_020.pth
Evaluating the Fine-Tuned Model#
After training, evaluate the model on your validation set:
tao model rtdetr evaluate -e rtdetr_trafficcam_finetune.yaml \
evaluate.checkpoint=/results/trafficcam_finetuned/model_epoch_050.pth
This will output mAP (mean Average Precision) and per-class detection metrics.
Exporting for Deployment#
Export the fine-tuned model to ONNX format for deployment:
tao model rtdetr export -e rtdetr_trafficcam_finetune.yaml \
export.checkpoint=/results/trafficcam_finetuned/model_epoch_050.pth \
export.onnx_file=/results/trafficcam_finetuned.onnx \
export.input_width=960 \
export.input_height=544
The exported ONNX model can then be converted to TensorRT for optimized inference in production environments.
Fine-Tuning Tips#
Learning Rate: Start with a lower learning rate (0.0001) when fine-tuning from a pretrained model
Data Augmentation: Adjust
distortion_probandiou_crop_probbased on your dataset characteristicsEpochs: Monitor validation mAP; typically 30-50 epochs are sufficient for fine-tuning
Backbone Freezing: Set
train_backbone: falseto freeze backbone weights if you have limited dataMulti-GPU Training: Increase
num_gpusand adjustbatch_sizeproportionally for faster training
For more detailed information on RT-DETR configuration options, refer to the TAO RT-DETR documentation.