Object Detection and Tracking#

Overview#

The Real Time Video Intelligence CV Microservice leverages NVIDIA DeepStream SDK to generate metadata for each stream that downstream microservices can use to generate spatial metrics and alerts.

The microservice features rtvi-cv-app, a DeepStream pipeline that builds on the built-in deepstream-test5 app in the DeepStream SDK. This RTVI-CV app provides a complete application that takes streaming video inputs, decodes the incoming streams, performs inference & tracking, and sends the metadata to other microservices using the defined Protobuf schema.

The Real Time Video Intelligence CV Microservice supports both 2D single-camera detection models (RT-DETR, Grounding DINO) for object detection and classification, as well as 3D multi-camera model (Sparse4D) for birds-eye-view detection and tracking. All the models are integrated within DeepStream pipelines, providing a complete streaming analytics solution for AI-based video understanding.

Key Features#

  • Real-time Performance: TensorRT/Triton-accelerated inference

  • Multi-model Support: Flexible architecture supporting different detection models

  • DeepStream Integration: Built on NVIDIA’s proven streaming analytics framework

  • Scalable Architecture: Handles multiple camera streams with batch processing

  • Standardized Output: Consistent metadata schema for downstream processing

  • Production-Ready: Configurable pipelines with comprehensive monitoring

Architecture#

The Real Time Video Intelligence Microservice follows a modular, pipeline-based architecture built on NVIDIA DeepStream SDK. The architecture supports both 2D single-camera and 3D multi-camera detection pipelines.

Real Time Video Intelligence Microservice Architecture

Core Components#

  • Video Source: Handles multiple RTSP streams, file inputs with dynamic stream add/remove capabilities

  • Stream Multiplexer (nvstreammux): Batches video frames from multiple sources for efficient GPU processing

  • Preprocessor: Hardware-accelerated image transformation, normalization, and augmentation using nvdspreprocess plugin

  • Inference Engine: Supports both TensorRT (nvinfer) and Triton Inference Server (nvinferserver) backends for model execution

  • Tracker: Multi-object tracker for maintaining object identities across frames

  • Metadata Generator: Converts detection outputs to standardized protobuf format

  • Message Broker: Kafka producer for streaming metadata to downstream microservices

Models Supported#

The Real Time Video Intelligence CV Microservice supports both 2D single-camera and 3D multi-camera detection models:

2D Single-Camera Models:

  • Mask-Grounding-DINO (Smart City Blueprint): Open vocabulary multi-modal object detection model trained on commercial data with language grounding for zero-shot detection using natural language text prompts

  • RT-DETR (Smart City Blueprint): Object detection model included in the TAO Toolkit, transformer-based end-to-end detector optimized for real-time performance

  • RT-DETR (Warehouse Blueprint): Real-Time Detection Transformer object detection model optimized for warehouse environments

3D Multi-Camera Model:

  • Sparse4D (Warehouse Blueprint): Multi-Camera 3D Detection and Tracking model with 4D (spatial-temporal) capabilities for Birds-Eye-View (BEV) detection across multiple synchronized camera sensors with temporal instance banking

API Reference#

The Real Time Video Intelligence CV (RTVI-CV) Microservice exposes a REST API for stream management, health checks, metrics, and AI/ML operations.

For complete API documentation, including all endpoints, request/response schemas, and interactive examples, see the Object Detection and Tracking API Reference.

API categories:

  • Health Check — Liveness, readiness, and startup probes (Kubernetes-compatible)

  • Stream Management — Add, remove, and query video streams dynamically

  • Monitoring — Metrics and telemetry with Prometheus and OpenTelemetry support

  • Metadata — Service version and license information

  • AI/ML Operations — Text embedding generation and other ML capabilities * Text embeddingsPOST /api/v1/generate_text_embeddings to generate vector embeddings from text

All endpoints are prefixed with /api/v1. Base URL: http://<host>:9000.

ReID and Embeddings (REST API and Config Reference)#

This section describes deployment, features, configuration, and REST APIs for text embeddings, object embeddings (vision encoder), adding video streams by URL, and attaching timestamps from the API payload.

Supported Models#

Component – Model mapping#

Component

Models

Backend

Vision Encoder (RT-Embedding)

RADIO-CLIP / SigLIP V2-SO400M-P16-256

TensorRT

Text Embedder

SigLIP2 (ONNX) / SigLIP2-giant

ONNX Runtime

Embedding NIM

Combined ONNX Models (Image + Text)#

Both models below are exported as combined CLIP-style ONNX files containing image and text encoders in a single graph. The plugins automatically extract the relevant subgraph (image-only for vision encoder, text-only for text embedder).

Model

Type

Image Size

Text Max Length

Embedding Dim

Tokenizer

Extra Inputs

RADIO-CLIP

RADIO-CLIP (combined image+text)

224x224

77

1024

CLIPTokenizer (BPE)

input_ids

SigLIP2

SigLIP V2-SO400M-P16-256

256x256

64

1152

GemmaTokenizer (SentencePiece)

input_ids, attention_mask

Model downloads (NGC) – deployable ONNX#

Features added#

  1. Text embeddings using RADIO-CLIP ONNX or SigLIP2 ONNX (config + REST API).

  2. Object embeddings using RADIO-CLIP / SigLIP2 (vision encoder plugin with TensorRT).

  3. Combined ONNX model support – a single ONNX file serves both image and text embeddings; the plugins automatically extract the relevant subgraph.

  4. Add file video URL via curl, including support for creation time of the file URL (see stream add API and streammux config below).

Text embedder (config)#

Enable the text embedder in your config file. The model-name property selects the encoder backend.

Text embedder property reference#

Property

Description

``enable``

Enable the text embedder (1 = on, 0 = off).

``model-name``

Use siglip2-onnx for ONNX (RADIO-CLIP or SigLIP2; set onnx-model-path and tokenizer-dir accordingly).

``onnx-model-path``

Path to the combined ONNX model file (required for siglip2-onnx). Relative paths are resolved from the config file location.

``tokenizer-dir``

Path to the tokenizer directory containing tokenizer.json (required for siglip2-onnx). Relative paths are resolved from the config file location.

Generate text embeddings (curl)#

Endpoint: POST http://localhost:9000/api/v1/generate_text_embeddings

Example:

curl -XPOST http://localhost:9000/api/v1/generate_text_embeddings -d '{
    "text_input": "Hello, world!",
    "model": ""
}'

Field

Description

text_input

Input text to embed

model

Currently don’t care – can be left empty. Reserved for future use.

Video URL – add stream (curl)#

Endpoint: POST http://localhost:9000/api/v1/stream/add

Use this to register a video URL for download and add it as a stream. The payload can include creation_time; to use it as the stream timestamp, set [streammux] attach-sys-ts-as-ntp=0 (see section below).

Example:

curl -XPOST 'http://localhost:9000/api/v1/stream/add' -d '{
  "key": "sensor",
  "value": {
      "camera_id": "uniqueSensorID1",
      "camera_name": "front_door",
      "camera_url": "http://localhost:30000/sample_720p.mp4",
      "creation_time": "2024-12-12T18:32:11.123Z",
      "change": "camera_add",
      "metadata": {
          "resolution": "1920 x1080",
          "codec": "h264",
          "framerate": 30
      }
  },
  "headers": {
      "source": "vst",
      "created_at": "2021-06-01T14:34:13.417Z"
  }
}'

Field

Description

key

e.g. "sensor"

value.camera_id

Unique sensor/stream identifier

value.camera_name

Human-readable name (e.g. front_door)

value.camera_url

Video URL to download and add as stream

value.creation_time

Timestamp (e.g. ISO 8601); used when attaching ts from payload (see section below)

value.change

e.g. "camera_add"

value.metadata

Optional (resolution, codec, framerate, etc.)

headers

Optional request metadata

Attach creation_time (base time of files) from REST API as timestamp (config)#

To use the ``creation_time`` from the REST API payload (e.g. from /api/v1/stream/add) as the stream timestamp instead of system/NTP time:

[streammux]
attach-sys-ts-as-ntp=0
  • ``attach-sys-ts-as-ntp=0`` – use the timestamp provided in the REST API payload (e.g. creation_time).

  • ``attach-sys-ts-as-ntp=1`` (default) – use system/NTP timestamp.

Ensure the stream-add payload includes a valid creation_time when using this option.

Vision encoder plugin (config)#

The vision encoder plugin generates object embeddings (e.g. for ReID) using a TensorRT engine built from an ONNX model.

Combined ONNX model support: When a combined image+text ONNX model (e.g. RADIO-CLIP or SigLIP2) is provided, the TensorRT engine builder automatically:

  1. Detects multiple outputs and prunes to image_embedding only.

  2. TensorRT’s dead code elimination removes the entire text encoder.

  3. Extra text inputs (input_ids, attention_mask) are bound with zero-filled buffers.

This means you can use the same ONNX file for both [visionencoder] (image embeddings via TRT) and [text-embedder] (text embeddings via ONNX Runtime).

Example: RADIO-CLIP#

[visionencoder]
enable=1
onnx-model=radio_clip_v1.0.onnx
tensorrt-engine=radio_clip_v1.0.engine
batch-size=16
min-crop-size=32
gpu-id=0
skip-interval=3

Property reference#

Property

Description

``enable``

Enable the vision encoder plugin (1 = on, 0 = off).

``tensorrt-engine``

Path to the TensorRT engine file. If not present, the engine is built automatically from the ONNX model.

``onnx-model``

Path to the ONNX model file. The same directory must contain the external weights .bin file. Supports both single-input (image-only) and combined (image+text) ONNX models.

``batch-size``

Batch size for TensorRT engine build and inference.

``min-crop-size``

Minimum crop size (width/height in pixels) for embedding generation; objects smaller than this are skipped.

``skip-interval``

Embedding generation at configurable frame intervals.

``embedding-classes``

Configurable classes for embedding (e.g. person,car). Comma-separated list of class labels; only these classes get embeddings.

``gpu-id``

GPU device ID to use.

Example: SigLIP2#

[visionencoder]
enable=1
onnx-model=siglip2_v1.0.onnx
batch-size=16
min-crop-size=32
gpu-id=0
skip-interval=3

Note: Image normalization is auto-detected from the ONNX model path: [0, 1] for RADIO-CLIP, [-1, 1] when the path contains siglip.

Combined ONNX model deployment#

Required files#

Each combined ONNX model requires three components in the same directory:

File

Description

<model>.onnx

Model graph (small, ~1 MB)

<weights>.bin

External weights (large, ~1-4 GB). The filename must match what the ONNX references internally.

<model>_tokenizer/

Tokenizer directory containing tokenizer.json (used by text embedder only).

Engine rebuild#

When switching ONNX models, delete the existing .engine / .plan file and its .meta sidecar so the TensorRT engine is rebuilt with the correct output pruning:

rm -f model.plan model.plan.meta

The engine will be automatically rebuilt on next launch.

Deployment#

IGX Thor: VIC clocks for best performance

For IGX Thor, VIC clocks need to be boosted for best performance and latency. Run the following before deployment:

sudo nvpmodel -m 0
sudo jetson_clocks
sudo su
# Run the following in the root shell (after sudo su):
echo performance > /sys/class/devfreq/8188050000.vic/governor

1. Blueprint Deployment

For warehouse deployment, refer Warehouse Quickstart Guide For smart city deployment, refer Smart City Quickstart Guide

2. Verify Deployment

Check service health:

# Check liveness
curl http://localhost:<port>/api/v1/live

# Check readiness
curl http://localhost:<port>/api/v1/ready

# Check startup
curl http://localhost:<port>/api/v1/startup

# Get stream information
curl http://localhost:<port>/api/v1/stream/get-stream-info

# Monitor metrics
curl http://localhost:<port>/api/v1/metrics

3. Monitor Output

View detection metadata in Kafka topic or check logs for the service:

docker-compose logs -f <rtvi-cv-service-name>

4. TensorRT Engine File Creation and Reuse

On the first run, TensorRT automatically builds optimized engine files (.engine) from the ONNX models. This engine generation can take significant time depending on the model size and GPU. The engine files are stored inside the container’s storage volume (/opt/storage/ for 2D, /opt/storage/ for 3D).

The engine files are automatically retained across Docker restarts via the mounted storage volume, so subsequent restarts will reuse the previously built engines without rebuilding.

Note

If the Docker volumes are removed, the engine files will be deleted and TensorRT will rebuild them on the next run.

Reusing engines with custom models:

When deploying a custom ONNX model (e.g. a fine-tuned RT-DETR or Sparse4D checkpoint), the engine file from the first run is retained in the storage volume. To mount a custom pre-built engine file, add a volume entry to the corresponding Docker Compose file:

Warehouse 2D Blueprintdeployments/warehouse/warehouse-2d-app/warehouse-2d-app.yml (perception-2d service, volumes: section):

volumes:
  # ... existing volume mounts ...
  - $MDX_DATA_DIR/models/mtmc/rtdetr_warehouse_v1.0.fp16.onnx_b3_gpu0_fp16.engine:/opt/storage/rtdetr_warehouse_v1.0.fp16.onnx_b3_gpu0_fp16.engine

Warehouse 3D Blueprintdeployments/warehouse/warehouse-3d-app/warehouse-3d-app.yml (perception-3d service, volumes: section):

volumes:
  # ... existing volume mounts ...
  - $MDX_DATA_DIR/models/sparse4d/ov/model.engine:/opt/storage/model.engine

Note

  • The engine file name must be preserved exactly as the blueprint expects it. For Warehouse 2D, the PGIE config references rtdetr_warehouse_v1.0.fp16.onnx_b3_gpu0_fp16.engine at /opt/storage/ (defined in ds-ppl-analytics-pgie-config.yml). For Warehouse 3D, the inference config references model.engine at /opt/storage/ (defined in config.yaml). When mounting a custom engine, ensure the file name and path match these config entries, or update the config files accordingly.

  • Engine files are tied to the GPU architecture and TensorRT version they were built on. If you change GPU hardware or update TensorRT, delete the mounted engine files and allow the application to rebuild them.

  • When switching to a different ONNX model, remove the previously mounted .engine file so TensorRT rebuilds it for the new model.

  • Whenever changing to a newer custom model, ensure the earlier engine file(s) are deleted from the storage volume so that TensorRT generates a fresh engine for the new model.

2D Single Camera Detection and Tracking#

2D models perform object detection and classification on individual camera streams, providing accurate bounding box predictions and class labels in image coordinates. These models are ideal for single-camera applications requiring high-accuracy object detection.

DeepStream Pipeline

The diagram below shows the RTVI-CV pipeline used for 2D single camera detection and tracking.

2D Single Camera Detection and Tracking Pipeline Architecture

The VSS platform supports multiple 2D detection models, each optimized for different use cases:

  • RT-DETR: Transformer-based end-to-end detector

  • Grounding DINO: Zero-shot detector with language grounding for open-vocabulary detection

RT-DETR Detector RTVI-CV Pipeline#

The RT-DETR (Real-Time Detection TRansformer) detector pipeline is based on the deepstream-test5 app in the DeepStream SDK. The app takes streaming video inputs, decodes the incoming stream, performs inference & tracking, and lastly sends metadata over Kafka to other Metropolis Microservices, using the defined Protobuf schema.

RT-DETR is a transformer-based end-to-end object detector optimized for real-time performance. A finetuned RT-DETR model for warehouse blueprint is mounted at /opt/nvidia/deepstream/deepstream/sources/apps/sample_apps/metropolis_perception_app/models/mtmc/rtdetr_detection_aic25v0.41_openimages_ytcc_moving_classes_iter_012_v0.7.onnx. The model supports the following classes: Person, Agility_Digit_Humanoid, Fourier_GR1_T2_Humanoid, Nova_Carter, Transporter, Forklift, and Pallet.

A finetuned RT-DETR model for smartcity blueprint is mounted at /opt/nvidia/deepstream/deepstream/sources/apps/sample_apps/metropolis_perception_app/models/rtdetr-its/model_epoch_035.onnx. The model supports the following classes: background, two_wheeler, Vehicle, Person, and road_sign.

Configuration Options

The RT-DETR Detector RTVI-CV Pipeline has several key configuration options:

Grounding DINO Detector RTVI-CV Pipeline#

The Grounding DINO detector pipeline is based on the deepstream-test5 app in the DeepStream SDK. The app takes streaming video inputs, decodes the incoming stream, performs inference & tracking, and lastly sends metadata over Kafka to other Metropolis Microservices, using the defined Protobuf schema.

Grounding DINO is a zero-shot object detection model that combines vision and language understanding to detect objects based on free-form text descriptions (prompts). The implementation uses the DeepStream Triton Inference Server plugin (Gst-nvinferserver) with a custom processing library for text prompt support and optional instance segmentation masks. The app is enabled with PGIE (Primary GPU Inference Engines), NVDCF/DeepSORT tracker and message broker for sending metadata to Kafka.

Configuration Options

The Grounding DINO Detector RTVI-CV Pipeline has several key configuration options:

Text Prompt Configuration#

Labels for Grounding DINO are defined in the nvinferserver configuration file (config_triton_nvinferserver_gdino.txt) in the postprocess section. The text prompts enable zero-shot detection of objects using natural language descriptions.

postprocess {
  other {
   type_name: "Car . Truck . Bus . Motorcycle . Bicycle . Scooter . Emergency Vehicle . Vehicle . Person . ;0.4"
  }
}

Prompt Syntax:

  • Use periods (.) followed by spaces (” . “) to separate multiple objects

  • Add a semicolon (;) followed by confidence threshold (e.g., ;0.4 for 40% confidence)

  • Descriptive phrases enable fine-grained detection (e.g., “person wearing helmet”)

  • Case-insensitive processing

  • The threshold value filters detections below the specified confidence level

3D Multi Camera Detection and Tracking#

The 3D pipeline performs object detection and tracking across multiple synchronized camera streams using Sparse4D, a Birds-Eye-View (BEV) detection model. It maintains object identity across frames through temporal tracking with instance banking, providing 3D position, orientation, velocity, and persistent instance IDs for each detected object.

The pipeline ingests multicamera video streams, processes them through calibrated projection matrices for spatial alignment, and utilizes a feedback mechanism with temporal instance banking to maintain object identity across frames. Detection results include 3D position, orientation, velocity, and instance IDs, enabling sophisticated multi-camera fusion capabilities.

The processed metadata follows DeepStream’s standardized message format and transmits via Kafka brokers for downstream applications such as Multi-Camera Tracking (MCT), Real-Time Location Systems (RTLS), and Facility Safety Logic (FSL). The entire pipeline optimizes for real-time performance with TensorRT acceleration (FP16/FP32) and configurable batch processing, making it ideal for complex spatial understanding in applications like warehouse automation and traffic monitoring.

DeepStream Pipeline

The diagram below shows the RTVI-CV pipeline used for 3D multi camera detection and tracking.

Sparse4D RTVI-CV Pipeline Architecture

Sparse4D RTVI-CV Pipeline#

The Sparse4D RTVI-CV pipeline is based on the deepstream-test5 app in the DeepStream SDK. The app takes streaming video inputs from multiple synchronized camera streams, decodes the incoming streams, performs 3D inference & temporal tracking using instance banking, and sends metadata over Kafka to other Metropolis Microservices, using the defined Protobuf schema.

Sparse4D is a Birds-Eye-View (BEV) detection model that performs 3D object detection and tracking across multiple synchronized camera sensors. The model maintains object identity across frames through temporal tracking with instance banking, providing 3D position, orientation, velocity, and persistent instance IDs for each detected object.

Configuration Options

The Sparse4D RTVI-CV Pipeline has several key configuration options:

Implementation Details#

Since the application is built using DeepStream SDK deepstream-test5-app, refer to the following documentation for more details:

Kafka Integration#

The Real Time Video Intelligence CV Microservice publishes detection and tracking metadata to Kafka for downstream processing by other microservices such as Multi-Camera Tracking (MCT), Real-Time Location Systems (RTLS), and Facility Safety Logic (FSL).

Kafka Topics

The microservice publishes messages to configurable Kafka topics. By default, detection metadata is sent to the deepstream-metadata topic.

Configuration

Configure Kafka integration in the DeepStream application configuration file:

[message-broker]
enable=1
broker-proto-lib=/opt/nvidia/deepstream/deepstream/lib/libnvds_kafka_proto.so
broker-conn-str=kafka-broker:9092
topic=deepstream-metadata
comp-id=perception-app

Message Formats#

Detection and tracking metadata is serialized as Protocol Buffer messages using the Frame message type defined in the Protobuf Schema.

Message Header:

  • message_type: "frame" (default, if not specified)

Message Structure:

Key Fields:

Frame message:

  • version: Schema version

  • id: Frame identifier

  • timestamp: Frame timestamp in UTC format

  • sensorId: Camera/sensor identifier

  • objects: Array of detected objects with bounding boxes, classifications, tracking IDs, and attributes

  • info: Additional metadata (key-value pairs)

Object message:

  • id: Object tracking ID

  • bbox: Bounding box coordinates (leftX, topY, rightX, bottomY) for 2D detection

  • bbox3d: 3D bounding box coordinates for Sparse4D detection

  • type: Object class (e.g., Person, Vehicle, Forklift)

  • confidence: Detection confidence score

  • coordinate: 3D position (x, y, z) for Sparse4D detection

  • speed: Object velocity for Sparse4D tracking

  • dir: Movement direction vector for Sparse4D tracking

  • info: Additional object attributes

DeepStream Configuration Files#

The following table lists the DeepStream configuration files for different blueprint deployments. These configurations define the pipeline behavior, model parameters, and integration settings for 2D and 3D computer vision models.

DeepStream configuration files are present in RTVI-CV Docker at below mentioned locations.

Smart City Blueprint#

Configuration Location: deployments/smartcities/smc-app/deepstream/configs/

Smart City Configuration Files#

Configuration File

Description

rtdetr-960x544.txt

Primary GIE (PGIE) configuration for RT-DETR

run_config-api-rtdetr-protobuf.txt

Main DeepStream pipeline configuration for RT-DETR & Grounding DINO

config_triton_nvinferserver_gdino.txt

Triton inference server configuration for Grounding DINO model

Note: Few config parameters are updated dynamically based on the model name and number of streams.

Warehouse 2D Blueprint#

Please refer to the Warehouse 2D Blueprint documentation for configurations.

Warehouse 3D Blueprint#

Please refer to the Warehouse 3D Blueprint documentation for configurations.

Customization of Microservice#

The microservice provides flexible customization options to adapt to different deployment requirements, models, and use cases. This section describes the key customization areas.

Model Customization#

Updating Model Checkpoints for provided models

The microservice supports RT-DETR and Grounding DINO detection models for 2D object detection:

For custom 2D detection models (RT-DETR and Grounding DINO) trained with TAO Toolkit:

  1. Export your model to ONNX format using TAO

  2. Update deepstream application configuration file to reference your model:

[primary-gie]
model-engine-file=<custom_model_name_b4_gpu0_fp16>.engine
onnx-file=<custom_model_name>.onnx
batch-size=4 # set to the batch size of your model

Update the PGIE configuration file (nvinfer or nvinferserver ) for your custom model in the deepstream application configuration file.

For integrating custom model architectures (beyond RT-DETR and Grounding DINO), you will need to export your model to ONNX format, configure the DeepStream nvinfer plugin with appropriate preprocessing and parsing parameters, and potentially implement custom bounding box parsers. Refer to the DeepStream nvinfer Plugin Guide for detailed integration steps.

For 3D object detection models, refer to the Integrating a Sparse4D Model Checkpoint section in the 3D Multi Camera Detection and Tracking (Sparse4D) documentation.

Tracker Customization#

Tracker Selection and Configuration

DeepStream supports multiple tracking algorithms. You can configure tracker section in the deepstream application configuration file as per your requirements. For example:

[tracker]
enable=1
tracker-width=640
tracker-height=384
ll-lib-file=/opt/nvidia/deepstream/deepstream/lib/libnvds_nvmultiobjecttracker.so
ll-config-file=config_tracker_NvDCF_perf.yml
display-tracking-id=1

Tracker Algorithm Options

  • NvDCF: Discriminative Correlation Filter (recommended for most use cases)

  • IOU: Intersection over Union tracker (lightweight, best for static cameras)

  • DeepSORT: Deep learning-based tracker (best accuracy, higher compute)

For detailed tracker configuration options, parameters, and algorithm-specific settings, refer to the Gst-nvtracker Plugin Documentation.

Message Broker Customization#

Kafka Configuration

Customize message broker output in the deepstream application configuration file:

[message-broker]
enable=1
broker-proto-lib=/opt/nvidia/deepstream/deepstream/lib/libnvds_kafka_proto.so
broker-conn-str=kafka-broker:9092
topic=deepstream-metadata
comp-id=perception-app

Redis Configuration

For Redis message broker, use the deepstream application configuration file:

[message-broker]
enable=1
broker-proto-lib=/opt/nvidia/deepstream/deepstream/lib/libnvds_redis_proto.so
broker-conn-str=redis-server:6379

For detailed message broker configuration options, parameters, and settings, refer to the Gst-nvmsgbroker Plugin Documentation.

Application Customization#

The application can be customized to add custom processing logic, modify metadata handling, or integrate additional GStreamer elements.

Source Code Location

The application source code is typically located in /opt/nvidia/deepstream/deepstream/sources/apps/sample_apps/metropolis_perception_app/ :

metropolis_perception_app/
├── metropolis_perception_app.c           # Main application with pipeline setup
├── metropolis_perception_app.h           # Header with structure definitions
├── Makefile                              # Build configuration

Key Customization Points

  1. Adding Custom Probes

    Add probes to access metadata and buffers at specific pipeline elements:

    static GstPadProbeReturn
    custom_pad_probe(GstPad *pad, GstPadProbeInfo *info, gpointer user_data)
    {
        GstBuffer *buf = (GstBuffer *) info->data;
        NvDsBatchMeta *batch_meta = gst_buffer_get_nvds_batch_meta(buf);
    
        // Access and process metadata
        for (NvDsMetaList *l_frame = batch_meta->frame_meta_list; l_frame != NULL;
             l_frame = l_frame->next) {
            NvDsFrameMeta *frame_meta = (NvDsFrameMeta *) (l_frame->data);
            // Custom processing per frame
        }
    
        return GST_PAD_PROBE_OK;
    }
    
    // Attach probe to a pad
    GstPad *sink_pad = gst_element_get_static_pad(element, "sink");
    gst_pad_add_probe(sink_pad, GST_PAD_PROBE_TYPE_BUFFER,
                      custom_pad_probe, NULL, NULL);
    gst_object_unref(sink_pad);
    

Building Custom Application

After modifying the source code, rebuild the application:

cd metropolis_perception_app/
make clean
make

Deployment Considerations

When deploying customized applications using docker compose:

  1. Update the Docker container to include your custom binary:

    COPY metropolis_perception_app /opt/nvidia/deepstream/deepstream/sources/apps/sample_apps/metropolis_perception_app/
    RUN chmod +x /opt/nvidia/deepstream/deepstream/sources/apps/sample_apps/metropolis_perception_app/metropolis_perception_app
    
  2. Ensure all dependencies and libraries are available in the container

  3. Update configuration files to match your custom processing requirements

Common Customization Use Cases

  • Custom Object Filtering: Filter detected objects based on size, confidence, or region of interest

  • Custom Analytics: Implement line crossing, zone intrusion, or occupancy counting

  • External System Integration: Connect to databases, REST APIs, or other services

  • Performance Monitoring: Add custom telemetry and performance metrics collection

RTSP Streaming#

Variable

Description

Default

RTVI_RTSP_LATENCY

RTSP latency (ms)

2000

RTVI_RTSP_TIMEOUT

RTSP timeout (ms)

2000

RTVI_RTSP_RECONNECTION_INTERVAL

Time to detect stream interruption and wait for reconnection (seconds)

5.0

RTVI_RTSP_RECONNECTION_WINDOW

Duration to attempt reconnection after interruption (seconds)

60.0

RTVI_RTSP_RECONNECTION_MAX_ATTEMPTS

Max reconnection attempts

10

Kafka Configuration#

Variable

Description

Default

KAFKA_ENABLED

Enable Kafka integration

true

KAFKA_BOOTSTRAP_SERVERS

Kafka broker address

localhost:9092

KAFKA_TOPIC

Topic for embedding messages

mdx-bev

ERROR_MESSAGE_TOPIC

Topic/channel for error messages

mdx-bev-errors

OpenTelemetry Support#

The microservice supports OpenTelemetry for exporting metrics to observability platforms like Prometheus and Grafana.

Configuration#

Configure OpenTelemetry using the following environment variables:

Environment Variable

Description

OTEL_SDK_DISABLED

Set to "true" to disable all telemetry (default: "false")

OTEL_SERVICE_NAME

Service identifier (e.g., "rtvi-cv")

OTEL_EXPORTER_OTLP_ENDPOINT

Collector base URL (e.g., "http://otel-collector:4318")

OTEL_METRIC_EXPORT_INTERVAL

Metric export interval in milliseconds (default: 60000)

OTEL_METRICS_EXPORTER

Export destination: "console", "otlp", or "none" (default: "otlp")

Additionally, set below parameters in the deepstream application configuration file:

[tiled-display]
enable=3

[sinkN]
nvdslogger=1

Supported Prometheus Metrics#

The following metrics are exported to Prometheus for monitoring and alerting:

Stream Performance Metrics:

Metric Name

Description

Typical Value

stream_fps

Frames per second processed for each stream

25-30 (depends on source)

stream_latency_milliseconds

End-to-end pipeline latency in milliseconds (from frame capture to metadata output)

30-100ms (lower is better)

stream_frame_number

Current frame number being processed for each stream (incremental counter)

Monotonically increasing

stream_count

Total number of active streams being processed

Based on configuration

System Resource Metrics:

Metric Name

Description

cpu_utilization

CPU utilization percentage across all cores

gpu_utilization

GPU compute utilization percentage

ram_memory_gb

System RAM memory usage in gigabytes

gpu_memory_gb

GPU memory usage in gigabytes

Note

gpu_memory_gb is not applicable on aarch64 devices (e.g., Jetson Thor) as they use unified memory, so it returns -1.

OpenTelemetry Collector Configuration#

Ensure an OpenTelemetry Collector is running on the configured otlp-uri endpoint. To filter out inactive stream metrics, add the following processor to your collector configuration:

processors:
  filter/drop_inactive_streams:
    error_mode: ignore
    metrics:
      datapoint:
        - 'metric.name == "stream_fps" and value_double == -1.0'
        - 'metric.name == "stream_latency" and value_double == -1.0'
        - 'metric.name == "stream_frame_number" and value_int == -1'

If exporting to Prometheus, set metric_expiration >= otlp-interval to drop stale metrics:

exporters:
  prometheus:
    endpoint: "0.0.0.0:8889"
    metric_expiration: 4s

Runtime Configuration Using REST API#

The OpenTelemetry HTTP exporter can be configured at runtime using the metrics endpoint with custom headers. This allows dynamic configuration without restarting the microservice.

Available Headers:

  • X-REFRESH-PERIOD: Set the metrics push interval in milliseconds. If the OpenTelemetry exporter is not running, it starts the exporter at the default endpoint (http://localhost:4318) with the specified interval.

  • X-OTLP-URL: Set the OpenTelemetry collector endpoint. Starts posting metrics to the specified http://ip:port with default interval (5000 milliseconds).

Examples:

Set refresh interval to 3000 milliseconds (starts exporter at default endpoint if not running):

curl -XGET 'http://localhost:9000/api/v1/metrics' -H "X-REFRESH-PERIOD:3000"

Set custom collector endpoint (uses default 5000 milliseconds interval):

curl -XGET 'http://localhost:9000/api/v1/metrics' -H "X-OTLP-URL:http://192.168.1.100:4318"

Set both custom endpoint and interval:

curl -XGET 'http://localhost:9000/api/v1/metrics' -H "X-REFRESH-PERIOD:3000" -H "X-OTLP-URL:http://192.168.1.100:4318"

Note

If OTEL_SDK_DISABLED="true" is set in the environment variables, using the above runtime configuration will enable OpenTelemetry metrics support. The X-REFRESH-PERIOD value is specified in milliseconds.

Disable the OpenTelemetry HTTP exporter:

curl -XGET 'http://localhost:9000/api/v1/metrics' -H "X-REFRESH-PERIOD:-1"

Troubleshooting#

Common Issues#

Environment settings to be exported in working environment

  • DEEPSTREAM_ENABLE_SENSOR_ID_EXTRACTION=1 Enables sensor_id_extraction, which adds support for the updated schema required by rtvi-cv

  • GST_ENABLE_CUSTOM_PARSER_MODIFICATIONS=1 Enables custom_parser changes that patch the SEI handling logic in the OSS parser code to prevent crashes caused by NULL SEI pointer

Issue: Poor performance with large number of streams

For ensuring performance with large number of streams, need to enable sub-batches property in the nvtracker plugin. Refer nvtracker plugin documentation for more details.

For example:

For 24 streams, set sub-batches to 8:8:8.

Issue: Low FPS / High Latency

Solution:

  • Reduce batch size for latency-critical applications

  • Increase batch size for throughput optimization

  • Check GPU utilization (nvidia-smi)

Issue: Poor Detection Accuracy

Solution:

  • Adjust confidence threshold (pre-cluster-threshold)

  • Verify input image quality and resolution

  • Check preprocessing configuration (normalization, resize)

  • Fine-tune model on domain-specific data using TAO

Issue: TensorRT Engine Build Failure

Solution:

  • Verify ONNX model compatibility with TensorRT version

  • Check available GPU memory during engine build

  • Review TensorRT logs for specific errors

  • Set force_engine_rebuild: True to rebuild engine

Issue: Sparse4D Multi-Camera Sync Issues

Solution:

  • Verify camera time synchronization (NTP)

  • Check batch-size matches num_sensors

  • Ensure all cameras are streaming at same FPS

  • Review nvstreammux configuration

Debugging Tips#

  1. Enable Verbose Logging

export NVDS_LOG_LEVEL=4  # Debug level
  1. Monitor Performance

# Check GPU utilization
nvidia-smi dmon -s u

# Monitor DeepStream FPS
# Check console output for "FPS:" lines
  1. Visualize Outputs

Enable on-screen display (OSD) in DeepStream config:

[osd]
enable=1
border-width=3
text-size=15
  1. Dump Intermediate Tensors

For debugging model issues, enable tensor dumping:

# In config.yaml (Sparse4D)
dump_frames: True
dump_max_frames: 50

For additional troubleshooting guidance, see the DeepStream SDK Troubleshooting Guide.

Error Propagation Configuration#

The microservice supports error propagation using the message API with Redis protocol adaptors to monitor pipeline errors and stream-related issues. Configure error propagation in the application configuration:

[source-list]
#Set the below error propagation key to enable the error propagation to a given adaptor
enable-error-propagation=0
# Once above error propagation key is set, uncomment and update below key values accordingly
# All error messages (stream-related and GStreamer-based) published to user-defined topic
#proto-lib=/opt/nvidia/deepstream/deepstream/lib/libnvds_redis_proto.so
#conn-str=<host>;<port>
#topic=<topic>

Configuration Parameters:

  • enable-error-propagation: Set to 1 to enable error propagation (default: 0)

  • proto-lib: Path to the protocol adaptor library (libnvds_redis_proto.so)

  • conn-str: Connection string for the message broker (format: <host>;<port> for Kafka, <host>;<port> for Redis)

  • topic: Base topic name for error messages

References#

Official Documentation

Model Papers

External Resources

API Reference