3D Multi Camera Detection and Tracking (Sparse4D)#

The Real Time Video Intelligence CV Microservice leverages NVIDIA DeepStream SDK to generate metadata for each stream that downstream microservices can use to generate spatial metrics and alerts.

The microservice features metropolis-perception-app, a DeepStream pipeline that builds on the built-in deepstream-test5 app in the DeepStream SDK. This perception app provides a complete application that takes streaming video inputs, decodes the incoming streams, performs inference & tracking, and sends the metadata to other microservices using the defined Protobuf schema.

The application features a modular architecture that integrates preprocessing plugins and a custom video template plugin (Sparse4D) specifically designed for multi-object detection and tracking in three-dimensional space.

At its core, Sparse4D performs Birds-Eye-View (BEV) 3D object detection and temporal tracking across multiple synchronized camera sensors within defined BEV groups. Refer to the TAO Sparse4D Finetuning page for more details on the model architecture and training process.

The pipeline ingests multicamera video streams, processes them through calibrated projection matrices for spatial alignment, and utilizes a feedback mechanism with temporal instance banking to maintain object identity across frames. Detection results include 3D position, orientation, velocity, and instance IDs, enabling sophisticated multi-camera fusion capabilities.

For detailed information on all components, APIs, and customization options, refer to the Object Detection and Tracking.

The diagram below shows the perception pipeline used in the microservice:

Metropolis Perception App Architecture

Configurations#

The Perception microservice requires several configuration files that control various aspects of the 3D multi-camera detection and tracking system. These files allow users to customize the system’s behavior according to their specific requirements.

Inference Configuration File#

The main configuration file for Sparse4D, config.yaml, handles properties related to model inference and controls the core functionality of the 3D detection and tracking system. The configuration parameters organize into the following categories:

Inference Properties#

Parameter

Description

Default Value

onnx_file

Path to the ONNX model file

/opt/nvidia/deepstream/deepstream-8.0/sources/sparse4d/sparse4d_tracking_aic25v0.5_cosmosv0.1_warehousev3_classes_iter_46400_v2.0.onnx

engine_file

Path to the TensorRT engine file

/opt/nvidia/deepstream/deepstream-8.0/sources/sparse4d/model.engine

labels_file

Path to the object class labels file

/opt/nvidia/deepstream/deepstream-8.0/sources/sparse4d/labels.txt

gpu_id

GPU ID to use for tensor operations

0

batch_size

Sparse4D model batch size

1

num_sensors

Number of camera sensors (DeepStream batch size)

4

enable_fp16

Enable FP16 precision (faster but less accurate)

True

force_engine_rebuild

Force rebuild of the engine even if it exists

False

feedback

Enable temporal feedback mechanism

True

partial_batch

Use partial batch for inference

True

interval

Number of consecutive batches to be skipped for inference

0

Calibration Properties#

Parameter

Description

Default Value

calib_file_path

Path to the calibration file

/opt/nvidia/deepstream/deepstream-8.0/sources/sparse4d/calibration.json

bev_group_name

BEV group name for camera grouping

bev-sensor-1

calib_mode

Calibration mode (synthetic or real)

synthetic

use_camera_groups

Groups cameras for multi-view fusion in synthetic mode

True

recentering

The model is trained and inferred in BEV coordinates. This flag re-centers the 3D bounding boxes from the original OV coordinates to the BEV coordinates. The Perception microservice converts the final output back to OV coordinates. Keep this flag enabled.

True

Preprocessing Properties#

Parameter

Description

Default Value

preprocessed_height

Height of preprocessed images

540

preprocessed_width

Width of preprocessed images

960

aug_configs

Dictionary of image augmentation parameters

See below

Augmentation Configuration:

aug_configs:
  resize: 0.5
  resize_dims: [960, 540]
  crop: [0, 0, 960, 540] # [x, y, width, height]
  flip: False
  rotate: 0
  rotate_3d: 0

Instance Bank Properties#

Parameter

Description

Default Value

num_anchor

Number of anchors

900

embed_dims

Embedding dimensions

256

anchor

Path to anchor file

/opt/nvidia/deepstream/deepstream-8.0/sources/sparse4d/_ov_kmeans900_v0.5.npy

anchor_handler

Anchor handler implementation

SparseBox3DKeyPointsGenerator

num_temp_instances

Number of temporal instances to track

600

default_time_interval

Default time interval for temporal anchor projection

0.05

confidence_decay

Confidence decay rate for temporal tracking

0.8

anchor_grad

Enable anchor gradients

False

feat_grad

Enable feature gradients

False

max_time_interval

Maximum time interval to maintain object identity

2

reid_dims

Re-identification dimensions

-1

Decoder Properties#

Parameter

Description

Default Value

num_output

Maximum number of bounding boxes to output

300

score_threshold

Score threshold for decoding output

0.1

num_torch_threads

Number of threads for PyTorch

0

Debugging Properties#

Parameter

Description

Default Value

log_level

Logging level: info (all messages), warn (warnings & errors), error (errors only)

error

display_partial_batch_info

Display partial batch information

False

display_tensor_info

Whether to display tensor information

False

display_bbox_list_info

Whether to display bboxlist information

False

max_objects_to_display

Maximum number of objects to display in bboxlist

5

dump_frames

Enable/disable frame dumping for debugging

False

dump_max_frames

Number of frames to dump from each batch

50

gpu_postprocess

Whether to perform postprocessing (instance bank & decoder) on GPU

False

trt_logger_severity

TensorRT logger severity level: 0=INTERNAL_ERROR 1=ERROR 2=WARNING 3=INFO 4=VERBOSE 5=Show all messages

2

enable_profiler

Enable TensorRT layer-wise profiler for performance analysis

False

DeepStream Configuration File#

The DeepStream main configuration file (ds-main-config.txt) builds on the DeepStream test5 application configuration and provides essential settings for the overall pipeline. This file controls various aspects of the application, including source configuration, stream multiplexing, message broker settings, and visualization parameters.

Key Configuration Sections#

Section

Purpose

[application]

Controls performance measurement settings and global application parameters

[source-list]

Defines input sources (RTSP streams), sensor IDs, and source management settings

[source-attr-all]

Configures source attributes like latency handling and reconnection parameters

[streammux]

Sets stream multiplexer parameters for batch processing and timestamp handling

[sink0] to [sink3]

Configures various output sinks (visualization, messaging, file output)

[pre-process]

Links to the preprocessing configuration file for input transformation

[primary-gie]

Specifies the custom video template plugin configuration for Sparse4D

For a complete understanding of all configuration options, refer to the DeepStream SDK Documentation.

Preprocess Plugin Configuration File#

The gst-nvdspreprocess plugin performs all preprocessing operations such as resizing, scaling, cropping, format conversion, and normalization on incoming video frames before feeding them to the neural network. These preprocessing operations prepare the input data according to the specific requirements of the Sparse4D model. You can configure all these preprocessing operations via the preprocess configuration file ds-mtmc-preprocess-config.txt.

For detailed information about the preprocessing configuration parameters and options, refer to the DeepStream Preprocessing Plugin Documentation.

Custom Video Template Plugin Configuration File#

The ds-mtmc-videotemplate_custom_lib_config.txt file contains the configuration parameters for the custom video template plugin. This file configures the custom video template plugin for the Perception microservice.

gpu-id=0
customlib-name=/opt/nvidia/deepstream/deepstream-8.0/lib/libnvdsgst_sparse4d.so

The configuration specifies which GPU to use for processing and the path to the custom Sparse4D library that implements the 3D detection and tracking functionality.

Kafka Configuration File#

The ds-kafka-config.txt file contains the Kafka configuration parameters for the Perception microservice. This file configures the Kafka producer and consumer settings for the Perception microservice.

[message-broker]
partition-key = sensorId

The partition key setting ensures that messages from the same sensor go to the same Kafka partition, maintaining message ordering per camera stream.

Calibration File#

The calibration.json file defines camera calibration parameters essential for 3D spatial understanding. This file contains intrinsic and extrinsic camera parameters for each camera in the system. For detailed information about camera calibration parameters and setup, refer to the calibration.

Labels File#

The labels.txt file contains the class labels that the model can detect along with class-wise confidence thresholds for filtering detections. Each entry in the file corresponds to a class that the model recognizes with its associated confidence threshold. Users can customize this file to match their specific detection requirements and fine-tune the detection sensitivity for each object class.

The file format uses a semicolon-separated list where each entry follows the pattern ClassName:ConfidenceThreshold:

Person:0.85;Fourier_GR1_T2_Humanoid:0.85;Agility_Digit_Humanoid:0.85;Nova_Carter:0.90;Transporter:0.90;Forklift:0.75;

These class labels represent common objects in industrial and warehouse environments that the Sparse4D model can detect and track in 3D space. Classes can be customized by training on domain-specific datasets using TAO Toolkit.

Anchor File#

The /opt/nvidia/deepstream/deepstream-8.0/sources/sparse4d/_ov_kmeans900_v0.5.npy file contains the anchor parameters for the Sparse4D model. This file configures the initial anchor parameters for the Sparse4D model.

Runtime Configuration Adjustments#

This section covers common configuration changes that users may need to make when adapting the Sparse4D pipeline for their specific deployment scenarios.

Modifying the Number of Input Streams#

When updating the number of input streams/cameras, you must update the batch size settings in multiple configuration files to ensure consistency. This step is critical for the proper operation of the Sparse4D pipeline.

The related deepstream configuration files can be found in the following directory for docker compose deployment: deployments/warehouse/warehouse-3d-app/deepstream/configs/

Updating the Inference Configuration#

Modify the config.yaml file to update the number of sensors:

num_sensors: 4  # Change this to match number of camera streams

Update DeepStream Configuration#

Modify the ds-main-config.txt file to update source and batch size settings:

[source-list]
# Set num-source-bins to match the number of streams
num-source-bins=4
# Update the source URLs, sensor IDs and names accordingly
list=rtsp://server1:port/stream1;rtsp://server1:port/stream2;...
sensor-id-list=Camera1;Camera2;Camera3;...
sensor-name-list=Camera1;Camera2;Camera3;...

[streammux]
# Change batch-size to the number of streams
batch-size=4

Updating Preprocess Configuration#

Modify the ds-mtmc-preprocess-config.txt file to update the network input shape:

# Update network-input-shape 1st value to match the number of streams
network-input-shape=4;3;540;960 # Change the first value to match number of camera streams

Integrating a Sparse4D Model Checkpoint#

The Sparse4D plugin supports model swapping, allowing you to use different Sparse4D models based on your specific use case.

Model Compatibility Requirements#

Ensure your new model meets these requirements:

  • Uses the same input tensor names and shapes as the plugin expects

  • Has a compatible output format with the post-processing logic

  • Follows the Sparse4D architecture pattern for 3D object detection

Updating Sparse4D Configuration#

  1. Mount the model in docker compose & modify the onnx_file config in config.yaml to point to your new model:

# Mount your new model file in docker compose
volumes:
 - $MDX_DATA_DIR/deepstream/models/your_new_model.onnx:/opt/nvidia/deepstream/deepstream-8.0/sources/sparse4d/your_new_model.onnx
# Update model file paths
onnx_file: "/opt/nvidia/deepstream/deepstream-8.0/sources/sparse4d/your_new_model.onnx"

# Force TensorRT engine rebuild for the new model
force_engine_rebuild: True

The Perception microservice will automatically build the TensorRT engine for the new model based on the above configuration changes.

2. Mount your new labels file in docker compose & modify the labels_file config in config.yaml to update the class labels. Your new labels should be in the same order as your model was trained on.

# Mount your new labels file in docker compose: deployments/warehouse/warehouse-3d-app/warehouse-3d-app.yml
volumes:
 - $MDX_SAMPLE_APPS_DIR/warehouse-3d-app/deepstream/label/your_new_labels.txt:/opt/nvidia/deepstream/deepstream-8.0/sources/sparse4d/your_new_labels.txt
# Update model file paths
labels_file: "/opt/nvidia/deepstream/deepstream-8.0/sources/sparse4d/your_new_labels.txt"
# Update the class labels - Example ``your_new_labels.txt``
Person:0.85;Fourier_GR1_T2_Humanoid:0.85;Agility_Digit_Humanoid:0.85;Nova_Carter:0.90;Transporter:0.90;Forklift:0.75;

3. Mount your new anchor file in docker compose & modify the anchor config in config.yaml to reflect the correct initialized anchors. Since the model training & inference depends on the initialized anchor .npy` file, you must use the right anchor file for accurate detection & tracking. If you finetune a model on a new scene or set of classes, you should use the new anchor file obtained from the TAO finetuning process.

# Mount your new anchor file in docker compose: deployments/warehouse/warehouse-3d-app/warehouse-3d-app.yml
volumes:
 - $MDX_DATA_DIR/deepstream/anchor/your_new_anchors.npy:/opt/nvidia/deepstream/deepstream-8.0/sources/sparse4d/your_new_anchors.npy
# Update the anchor file path
anchor: "/opt/nvidia/deepstream/deepstream-8.0/sources/sparse4d/your_new_anchors.npy"
  1. If the resolution is changed for the new model, set the NETWORK_WIDTH and NETWORK_HEIGHT environment variables to match the new model’s input resolution.

The sparse4d_setup.sh script automatically handles updating the required configuration parameters in different config files (config.yaml and ds-mtmc-preprocess-config.txt) based on these values.

# Example: Set environment variables for a model with 960x540 input resolution
export NETWORK_WIDTH=960
export NETWORK_HEIGHT=540