CV Pipeline Customization#

The VSS CV pipeline consists of three main customizable components:

  • an object detector

  • a multi-object tracker

  • CV models used inside tracker

Each component can be customized with a fine-tuned version based on the use-case. The pipeline can also be customized for higher performance by running multiple chunks of video on the same GPU. These customizations can be done before deployment in the following ways:

Customizations in Helm Chart Deployment#

Prerequisites#

The engine files for the models are generated when the VSS container is initialized. These engine files are cached in NGC_MODEL_CACHE. If you are configuring a custom model, make sure to delete any stale TensorRT engines in NGC_MODEL_CACHE before restarting the VSS container. This can be done by running the following command:

sudo microk8s kubectl delete pvc vss-ngc-model-cache-pvc

Customizing the Detector#

The grounding DINO detector can be customized to use a locally available ONNX model. The CV pipeline will generate the engine file for the custom model. It can also be customized to run at a specific frame interval. Here are the steps to customize the grounding DINO detector:

  1. Add the GDINO_MODEL_PATH environment variable, extraPodVolumes and extraPodVolumeMounts to the overrides file described in Enabling CV Pipeline: Set-of-Marks (SOM) and Metadata as shown below.

vss:
  applicationSpecs:
    vss-deployment:
      containers:
        vss:
          env:
          - name: GDINO_MODEL_PATH
            value: "/models/gdino/<your_finetuned_gdino_model>.onnx"
  extraPodVolumes:
  - name: gdino
    hostPath:
      path: <GDINO_MODEL_DIR>   # Path on host
  extraPodVolumeMounts:
  - name: gdino
    mountPath: /models/gdino
  1. The detector can also be customized to run at a specific frame interval by setting GDINO_INFERENCE_INTERVAL as shown in Enabling CV Pipeline: Set-of-Marks (SOM) and Metadata. The value of GDINO_INFERENCE_INTERVAL is set to 1 by default, meaning Grounding DINO will run inference for every alternate frame. To get maximum accuracy, GDINO_INFERENCE_INTERVAL must be set to 0, meaning Grounding DINO will run inference for every frame.

Customizing the Tracker#

The default tracker configuration is provided with the VSS blueprint and can be accessed in two ways:

  • cv_pipeline_tracker_config.yml in nvidia-blueprint-vss/charts/vss/values.yaml of the VSS Blueprint

  • Included inside the VSS container at /opt/nvidia/via/config/default_tracker_config.yml

The default tracker configuration is tuned for performance, while object segmentation is disabled. For more details about the tracker configuration parameters, refer to NvMultiObjectTracker Library.

A custom tracker configuration using MaskTracker for higher accuracy tracking and object segmentation is also provided as part of the VSS blueprint. This higher accuracy configuration is included inside the VSS container at /opt/nvidia/via/config/config_tracker_MaskTracker.yml.

To use the higher accuracy tracker configuration, copy the contents of config_tracker_MaskTracker.yml to the overrides file described in Enabling CV Pipeline: Set-of-Marks (SOM) and Metadata as shown below.

vss:
  configs:
    cv_pipeline_tracker_config.yml:
      <Paste the tracker configs here, such as below>
      BaseConfig:
        minDetectorConfidence: 0.3529
      TargetManagement:
        enableBboxUnClipping: 1
        preserveStreamUpdateOrder: 0
        ...
      Segmenter:
        segmenterType: 1
        # below is the default Segmenter config file saved in VSS container already
        segmenterConfigPath: "/opt/nvidia/via/config/config_tracker_module_Segmenter.yml"
  applicationSpecs:
    vss-deployment:
      ...

For finer control, individual parameters can be modified in cv_pipeline_tracker_config.yml section in the overrides file.

Customizing Models in the Tracker#

The tracker internally uses the following models:

  • Re-identification model for higher accuracy tracking as mentioned in VSS CV Pipeline Models

  • MaskTracker uses SAM2 for tracking and segmentation as mentioned in VSS CV Pipeline Models. It includes image encoder, memory attention, mask decoder, and memory encoder models.

All these default models can be replaced with custom finetuned models. Custom finetuned models can be configured in cv_pipeline_tracker_config.yml section in the overrides file as described below.

  1. Mount the custom models: The custom models must be locally available on the host machine. Mount the host machine paths by adding extraPodVolumes and extraPodVolumeMounts to the overrides file:

vss:
  extraPodVolumes:
  - name: tracker-models
    hostPath:
      path: <TRACKER_MODELS_DIR>   # Path on host
  extraPodVolumeMounts:
  - name: tracker-models
    mountPath: /models/tracker_models
  applicationSpecs:
    vss-deployment:
      ...
  1. Provide fine-tuned ONNX models: Once custom ONNX models are provided, change below sections in the overrides file.

  • To update the ReID model, replace the existing onnxFile field in ReID with your own fine-tuned ONNX model in cv_pipeline_tracker_config.yml section.

    vss:
      configs:
        cv_pipeline_tracker_config.yml:
          ...
          ReID:
            onnxFile: /models/tracker_models/<path to your custom reid onnx file>
          ...
    
  • To update the Segmenter models, change segmenterConfigPath to /opt/configs/cv_pipeline_tracker_module_Segmenter.yml in cv_pipeline_tracker_config.yml section. This is the overridden Segmenter config file path. Create a section cv_pipeline_tracker_module_Segmenter.yml in the overrides file and replace the existing onnxFile fields in ImageEncoder, MemoryAttention, MaskDecoder, and MemoryEncoder with your own fine-tuned ONNX models.

    vss:
      configs:
        cv_pipeline_tracker_config.yml:
          ...
          Segmenter:
            segmenterType: 1
            # the Segmenter config overridden in the helm chart will be mapped to below path in VSS container
            segmenterConfigPath: /opt/configs/cv_pipeline_tracker_module_Segmenter.yml
        cv_pipeline_tracker_module_Segmenter.yml:
          ...
          ImageEncoder:
            ...
            onnxFile: /models/tracker_models/<path to your custom image encoder onnx file>
          MemoryAttention:
            ...
            onnxFile: /models/tracker_models/<path to your custom memory attention onnx file>
          MaskDecoder:
            ...
            onnxFile: /models/tracker_models/<path to your custom mask decoder onnx file>
          MemoryEncoder:
            ...
            onnxFile: /models/tracker_models/<path to your custom memory encoder onnx file>
    

The engine files will be generated internally by the VSS container. Hence, do not change the modelEngineFile field, which contains dummy paths.

  1. Provide engine files: Alternatively, engine files can be directly changed in the overrides file. If provided, they take precedence over ONNX files.

  • To update the ReID model, replace the existing modelEngineFile field in ReID with your own custom engine file in cv_pipeline_tracker_config.yml section.

    vss:
      configs:
        cv_pipeline_tracker_config.yml:
          ...
          ReID:
            modelEngineFile: /models/tracker_models/<path to your custom reid engine file>
          ...
    
  • To update the Segmenter models, change segmenterConfigPath to /opt/configs/cv_pipeline_tracker_module_Segmenter.yml in cv_pipeline_tracker_config.yml section. This is the overridden Segmenter config file path. Create a section cv_pipeline_tracker_module_Segmenter.yml in the overrides file and replace the existing modelEngineFile fields in ImageEncoder, MemoryAttention, MaskDecoder, and MemoryEncoder with your own custom engine files.

    vss:
      configs:
        cv_pipeline_tracker_config.yml:
          ...
          Segmenter:
            segmenterType: 1
            # the Segmenter config overridden in the helm chart will be mapped to below path in VSS container
            segmenterConfigPath: /opt/configs/cv_pipeline_tracker_module_Segmenter.yml
        cv_pipeline_tracker_module_Segmenter.yml:
          ...
          ImageEncoder:
            ...
            modelEngineFile: /models/tracker_models/<path to your custom image encoder engine file>
          MemoryAttention:
            ...
            modelEngineFile: /models/tracker_models/<path to your custom memory attention engine file>
          MaskDecoder:
            ...
            modelEngineFile: /models/tracker_models/<path to your custom mask decoder engine file>
          MemoryEncoder:
            ...
            modelEngineFile: /models/tracker_models/<path to your custom memory encoder engine file>
    

Customizing CV Pipeline Chunks#

The CV pipeline can be configured to run multiple chunks of video on the same GPU. The default CV pipeline is configured to run maximum 2 chunks of video on the same GPU. This can be customized by setting the NUM_CV_CHUNKS_PER_GPU parameter. Increasing NUM_CV_CHUNKS_PER_GPU will give better performance at the cost of increased GPU utilization and memory consumption. Recommended values are 4 and 6 for increased performance. The parameter NUM_CV_CHUNKS_PER_GPU can be configured in the overrides file as shown below.

vss:
  applicationSpecs:
    vss-deployment:
      containers:
        vss:
          env:
          - name: NUM_CV_CHUNKS_PER_GPU
            value: "4"
            ...

CV Customizations in Docker Compose Deployment#

Customizing the Detector#

The grounding DINO detector can be customized to use a locally available ONNX model. The CV pipeline will generate the engine file for the custom model. It can also be customized to run at a specific frame interval. Here are the steps to customize the grounding DINO detector:

  1. Set GDINO_MODEL_PATH environment variable to point to the ONNX model file. The model file must be present in the MODEL_ROOT_DIR directory, which is getting mounted in the Docker container.

GDINO_MODEL_PATH=<MODEL_ROOT_DIR>/<path_to_your_finetuned_gdino_model>.onnx
  1. The detector can also be customized to run at a specific frame interval by setting GDINO_INFERENCE_INTERVAL. The value of GDINO_INFERENCE_INTERVAL is set to 1 by default, meaning, Grounding DINO will run inference for every alternate frame. To get maximum accuracy, GDINO_INFERENCE_INTERVAL must be set to 0, meaning, Grounding DINO will run inference for every frame. For example:

GDINO_INFERENCE_INTERVAL=0

Customizing the Tracker#

The default tracker configuration is provided with the VSS blueprint and can be accessed in VSS container at /opt/nvidia/via/config/default_tracker_config.yml.

The default tracker configuration is tuned for performance. For more details about the tracker configuration parameters, refer to NvMultiObjectTracker Library.

A custom tracker configuration using MaskTracker for higher accuracy tracking and object segmentation is also provided as part of the VSS blueprint. This higher accuracy configuration is included inside the VSS container at /opt/nvidia/via/config/config_tracker_MaskTracker.yml.

To use the higher accuracy tracker configuration, copy the file config_tracker_MaskTracker.yml in host, and set the path in CV_PIPELINE_TRACKER_CONFIG environment variable.

CV_PIPELINE_TRACKER_CONFIG=<path to config_tracker_MaskTracker.yml>

This is just one example of tracker configuration. You can create your own tracker configuration by referring to the NvMultiObjectTracker Library and modify the parameters as per your requirements.

Customizing Models in the Tracker#

The tracker internally uses the following models:

  • Re-identification model for higher accuracy tracking as mentioned in VSS CV Pipeline Models

  • MaskTracker uses SAM2 for tracking and segmentation as mentioned in VSS CV Pipeline Models. It includes image encoder, memory attention, mask decoder, and memory encoder models.

All these default models can be replaced with custom finetuned models. Custom finetuned models can be configured in custom tracker configuration file that is set using CV_PIPELINE_TRACKER_CONFIG environment variable.

To customize the models, follow the steps below:

  1. Mount the custom models: The custom models must be locally available on the host machine in MODEL_ROOT_DIR directory.

  2. Update fine-tuned ONNX models in tracker configuration: Once custom ONNX models are provided, change below sections in the custom tracker configuration file set by CV_PIPELINE_TRACKER_CONFIG environment variable.

  • To update the ReID model, replace the existing onnxFile field in ReID with your own fine-tuned ONNX model.

    ...
    ReID:
      ...
      onnxFile: <MODEL_ROOT_DIR>/<path to your custom reid onnx file>
      ...
    
  • To update the Segmenter models, update the segmenterConfigPath field in the tracker configuration file to point to your custom segmenter config file (which must be located in the MODEL_ROOT_DIR directory). In your custom segmenter config file config_tracker_module_Segmenter_custom.yml, replace the existing onnxFile fields in ImageEncoder, MemoryAttention, MaskDecoder, and MemoryEncoder with your own fine-tuned ONNX models.

    ...
    Segmenter:
      segmenterConfigPath: <path to your custom segmenter config file> # must located in <MODEL_ROOT_DIR>. For example <MODEL_ROOT_DIR>/config_tracker_module_Segmenter_custom.yml
    

    In your custom segmenter config file config_tracker_module_Segmenter_custom.yml, update the onnx paths:

    # Assuming below onnxFiles are the custom models
    ...
    ImageEncoder:
      ...
      onnxFile: <MODEL_ROOT_DIR>/<path to your custom image encoder onnx file>
    MemoryAttention:
      ...
      onnxFile: <MODEL_ROOT_DIR>/<path to your custom memory attention onnx file>
    MaskDecoder:
      ...
      onnxFile: <MODEL_ROOT_DIR>/<path to your custom mask decoder onnx file>
    MemoryEncoder:
      ...
      onnxFile: <MODEL_ROOT_DIR>/<path to your custom memory encoder onnx file>
    

The engine files are generated internally by the VSS container. Do not change the modelEngineFile field, which contains dummy paths.

  1. Provide engine files: Alternatively, engine files can be directly provided in the custom tracker configuration file. If provided, they take precedence over ONNX files.

  • To update the ReID model, replace the existing modelEngineFile field in ReID with your own custom engine file.

    ...
    ReID:
      ...
      modelEngineFile: <MODEL_ROOT_DIR>/<path to your custom reid engine file>
      ...
    
  • To update the Segmenter models, update the segmenterConfigPath field in the tracker configuration file to point to your custom segmenter config file (which must be located in the MODEL_ROOT_DIR directory). In your custom segmenter config file config_tracker_module_Segmenter_custom.yml, replace the existing modelEngineFile fields in ImageEncoder, MemoryAttention, MaskDecoder, and MemoryEncoder with your own custom engine files.

    ...
    Segmenter:
      ...
      segmenterConfigPath: <path to your custom segmenter config file> # must located in <MODEL_ROOT_DIR>. For example <MODEL_ROOT_DIR>/config_tracker_module_Segmenter_custom.yml
    

    In your custom segmenter config file config_tracker_module_Segmenter_custom.yml, update the engine file paths:

    # Assuming below engineFiles are the custom models
    ...
    ImageEncoder:
      ...
      modelEngineFile: <MODEL_ROOT_DIR>/<path to your custom image encoder engine file>
    MemoryAttention:
      ...
      modelEngineFile: <MODEL_ROOT_DIR>/<path to your custom memory attention engine file>
    MaskDecoder:
      ...
      modelEngineFile: <MODEL_ROOT_DIR>/<path to your custom mask decoder engine file>
    MemoryEncoder:
      ...
      modelEngineFile: <MODEL_ROOT_DIR>/<path to your custom memory encoder engine file>
    

Customizing CV Pipeline Chunks#

The CV pipeline can be configured to run multiple chunks of video on the same GPU. The default CV pipeline is configured to run maximum 2 chunks of video on the same GPU. This can be customized by setting the NUM_CV_CHUNKS_PER_GPU parameter. Increasing NUM_CV_CHUNKS_PER_GPU will give better performance at the cost of increased GPU utilization and memory consumption. Recommended values are 4 and 6 for increased performance. The parameter NUM_CV_CHUNKS_PER_GPU can be configured in .env file:

NUM_CV_CHUNKS_PER_GPU=4

Recommendations for CV Pipeline Configuration#

We recommend using two sets of configurations for the CV Pipeline:

  • Performance mode: This is the default configuration and is optimized for performance. It uses the following settings:

    • GDINO_INFERENCE_INTERVAL = 1

    • Use a lightweight backbone like Swin-Tiny in grounding DINO detector

    • Tracker configuration: default_tracker_config.yml

    • Models used by tracker: Reidentification model

The metadata generated by CV pipeline will consist of bounding boxes, object IDs, and object types.

  • Accuracy mode: This configuration can be used for higher accuracy detection and tracking. It can be configured using the following settings:

    • GDINO_INFERENCE_INTERVAL = 0

    • Use a high accuracy backbone like SwinB in grounding DINO detector

    • Tracker configuration: config_tracker_MaskTracker.yml

    • Models used by tracker: Reidentification model and SAM2 Segmenter models

The metadata generated by CV pipeline will additionally consist of objects masks along with bounding boxes, object IDs, and object types. In accuracy mode, object masks are overlaid on the video and provided as input to VLM.

Example: Guide to Customize the CV Pipeline to High Accuracy Mode#

VSS CV pipeline defaults to performance mode. This section provides the steps for customizing the CV pipeline to high accuracy mode.

  1. Clear the PVC cache to delete stale TensorRT engines using the following command:

sudo microk8s kubectl delete pvc vss-ngc-model-cache-pvc
  1. VLM that gives the best accuracy is GTP-4o. Set the openai api key secret:

export OPENAI_API_KEY=<your_openai_api_key>
sudo microk8s kubectl create secret generic openai-api-key-secret --from-literal=OPENAI_API_KEY=$OPENAI_API_KEY
  1. Generate the swinb.onnx file:

  2. Download the overrides_cv_accuracy_mode.yaml file and change the mount path in overrides_cv_accuracy_mode.yaml to the host path folder of swinb.onnx at following location:

extraPodVolumes:
  - name: gdino
    hostPath:
      path: <GDINO_MODEL_DIR>   # Path on host
  1. Deploy using Helm with the overrides_cv_accuracy_mode.yaml file:

sudo microk8s helm install vss-blueprint nvidia-blueprint-vss-2.4.1.tgz --set global.ngcImagePullSecretName=ngc-docker-reg-secret -f overrides_cv_accuracy_mode.yaml

You can refer to Deploy Using Helm for more details on deployment status and accessing the VSS UI.

  1. After all the pods are running, open the VSS UI and follow these steps:

    Select the “its”, that is, traffic video. Tick checkbox enable CV Metadata. Set the following CV pipeline prompt vehicle . truck;0.3. Set the prompt to: Traffic Camera Video (its) (With “Enable CV Metadata” Selected). Click on the Summarize button.

The above steps are shown in the screenshot below:

CV Pipeline Customization UI snapshot
  1. The summarization process will start in VSS. After the process completes, VSS UI is updated with summary and the Set-of-Marks overlay video as shown below. The video will contain the IDs and masks overlaid on the object. The Set-of-Marks video contains only the frames sampled for VLM and not all the frames of the input video.

CV Pipeline Customization IDs and masks overlaid
  1. Based on the summary and Set-of-Marks video, you can ask questions in chat. One sample question is:

Do you observe any abnormal events in the video clip? If so, which cars are involved?

Verify that you observe an answer similar to:

CV Pipeline Customization QnA example