CV Pipeline Customization#

The VSS CV pipeline consists of three main customizable components: an object detector, a tracker, and cv models used inside tracker. Each component can be customized with a fine-tuned version based on the use-case. The pipeline can also be customized for higher performance by running multiple chunks of video on the same GPU. These customizations can be done before deployment in the following ways :

Customizations in Helm Chart Deployment#

Prerequisites#

The engine files for the models are generated when the VSS container is initialized. These engine files are cached in NGC_MODEL_CACHE. If you are configuring a custom model, make sure to delete any stale TensorRT engines in NGC_MODEL_CACHE before restarting the VSS container. This can be done by running the following command:

sudo microk8s kubectl delete pvc vss-ngc-model-cache-pvc

Customizing the Detector#

The grounding DINO detector can be customized to use a locally available onnx model. The CV pipeline will generate the engine file for the custom model. It can also be customized to run at a specific frame interval. Here are the steps to customize the grounding DINO detector:

  1. Add the GDINO_MODEL_PATH environment variable, extraPodVolumes and extraPodVolumeMounts to the overrides file described in Enabling CV Pipeline: Set-Of-Marks (SOM) & Metadata as shown below.

vss:
  applicationSpecs:
    vss-deployment:
      containers:
        vss:
          env:
          - name: GDINO_MODEL_PATH
            value: "/models/gdino/<your_finetuned_gdino_model>.onnx"
  extraPodVolumes:
  - name: gdino
    hostPath:
      path: <GDINO_MODEL_DIR>   # Path on host
  extraPodVolumeMounts:
  - name: gdino
    mountPath: /models/gdino
  1. The detector can also be customized to run at a specific frame interval by setting GDINO_INFERENCE_INTERVAL as shown in Enabling CV Pipeline: Set-Of-Marks (SOM) & Metadata. The value of GDINO_INFERENCE_INTERVAL is set to 1 by default i.e. Grounding DINO will run inference for every alternate frame. To get maxumim accuracy, GDINO_INFERENCE_INTERVAL should be set to 0 i.e. Grounding DINO will run inference for every frame.

Customizing the Tracker#

The default tracker configuration is provided with the VSS blueprint and can be accessed in two ways :

  • cv_pipeline_tracker_config.yml in nvidia-blueprint-vss/charts/vss/values.yaml of the VSS Blueprint

  • Included inside the VSS container at /opt/nvidia/via/config/default_tracker_config.yml

The defult tracker configuration is tuned for performance, while object segmentation is disabled. For more details about the tracker configuration parameters please refer to NvMultiObjectTracker Library.

A custom tracker configuration for both higher accuracy tracking and SAM2 object segmentation is also provided as part of the VSS blueprint. This higher accuracy configuration is included inside the VSS container at /opt/nvidia/via/config/config_tracker_NvDCF_accuracy_SAM2.yml.

To use the higher accuracy tracker configuration, copy the contents of config_tracker_NvDCF_accuracy_SAM2.yml to the overrides file described in Enabling CV Pipeline: Set-Of-Marks (SOM) & Metadata as shown below.

vss:
  configs:
    cv_pipeline_tracker_config.yml:
      <Paste the contents of config_tracker_NvDCF_accuracy_SAM2.yml here>
  applicationSpecs:
    vss-deployment:
      ...

For finer control, individual parameters can be modified in cv_pipeline_tracker_config.yml section in the overrides file.

Customizing Models in the Tracker#

The tracker internally uses three models:

  • Re-identification model for higher accuracy tracking as mentioned in VSS CV Pipeline Models

  • SAM2 Segmenter for segmentation as mentioned in VSS CV Pipeline Models. SAM2 internally uses (Memory bank not supported yet):

    • ImageEncoder model

    • MaskDecoder model

All the three default models can be replaced with custom finetuned models. Custom finetuned models can be configured in cv_pipeline_tracker_config.yml section in the overrides file as described below.

  1. Mount the custom models: The custom models should be locally available on the host machine. Mount the host machine paths by adding extraPodVolumes and extraPodVolumeMounts to the overrides file as shown below.

vss:
  extraPodVolumes:
  - name: tracker-models
    hostPath:
      path: <TRACKER_MODELS_DIR>   # Path on host
  extraPodVolumeMounts:
  - name: tracker-models
    mountPath: /models/tracker_models
  applicationSpecs:
    vss-deployment:
       ...

2. Provide fine-tuned ONNX models: Replace the existing models with your own fine-tuned ONNX models. This requires updating any of the following fields in the cv_pipeline_tracker_config.yml section:

vss:
  configs:
    cv_pipeline_tracker_config.yml:
      ...
      ReID:
        onnxFile: /models/tracker_models/<path to your custom reid onnx file>
      ...
      Segmenter:
        ...
        ImageEncoder:
          onnxFile: /models/tracker_models/<path to your custom image encoder onnx file>
          ...
        MaskDecoder:
          onnxFile: /models/tracker_models/<path to your custom mask decoder onnx file>
          ...

Please note that the engine files will be generated internally by the VSS container. Hence, do not change the modelEngineFile field which contains dummy paths.

  1. Provide engine files: Alternatively, provide engine files. If engine files are provided, they will take precedence over ONNX files.

vss:
  configs:
    cv_pipeline_tracker_config.yml:
      ...
      ReID:
        modelEngineFile: /models/tracker_models/<path to your custom reid engine file>
      ...
      Segmenter:
        ...
        ImageEncoder:
          modelEngineFile: /models/tracker_models/<path to your custom image encoder engine file>
          ...
        MaskDecoder:
          modelEngineFile: /models/tracker_models/<path to your custom mask decoder engine file>
          ...

Customizing CV Pipeline Chunks#

The CV pipeline can be configured to run multiple chunks of video on the same GPU. The default CV pipeline is configured to run maximum 2 chunks of video on the same GPU. This can be customized by setting the NUM_CV_CHUNKS_PER_GPU parameter. Increasing NUM_CV_CHUNKS_PER_GPU will give better performance at the cost of increased GPU utilization and memory consumption. Recommended values are 4 and 6 for increased performance. The parameter NUM_CV_CHUNKS_PER_GPU can be configured in the overrides file as shown below.

vss:
  applicationSpecs:
    vss-deployment:
      containers:
        vss:
          env:
          - name: NUM_CV_CHUNKS_PER_GPU
            value: "4"
            ...

CV Customizations in Docker Compose Deployment#

Customizing the Detector#

The grounding DINO detector can be customized to use a locally available onnx model. The CV pipeline will generate the engine file for the custom model. It can also be customized to run at a specific frame interval. Here are the steps to customize the grounding DINO detector:

  1. Set GDINO_MODEL_PATH environment variable to point to the onnx model file. Please note that the model file should be present in the MODEL_ROOT_DIR directory which is getting mounted in the docker container.

GDINO_MODEL_PATH=<MODEL_ROOT_DIR>/<path_to_your_finetuned_gdino_model>.onnx
  1. The detector can also be customized to run at a specific frame interval by setting GDINO_INFERENCE_INTERVAL. The value of GDINO_INFERENCE_INTERVAL is set to 1 by default i.e. Grounding DINO will run inference for every alternate frame. To get maxumim accuracy, GDINO_INFERENCE_INTERVAL should be set to 0 i.e. Grounding DINO will run inference for every frame e.g.

GDINO_INFERENCE_INTERVAL=0

Customizing the Tracker#

The default tracker configuration is provided with the VSS blueprint and can be accessed in VSS container at /opt/nvidia/via/config/default_tracker_config.yml

The defult tracker configuration is tuned for performance. For more details about the tracker configuration parameters please refer to NvMultiObjectTracker Library.

A custom tracker configuration for both higher accuracy tracking and SAM2 object segmentation is also provided as part of the VSS blueprint. This higher accuracy configuration is included inside the VSS container at /opt/nvidia/via/config/config_tracker_NvDCF_accuracy_SAM2.yml.

To use the higher accuracy tracker configuration, copy the file config_tracker_NvDCF_accuracy_SAM2.yml in host and set the path in CV_PIPELINE_TRACKER_CONFIG environment variable.

CV_PIPELINE_TRACKER_CONFIG=<path_to_config_tracker_NvDCF_accuracy_SAM2.yml>

Please note that this is just one example of tracker configuration. You can create your own tracker configuration by referring to the NvMultiObjectTracker Library and modify the parameters as per your requirements.

Customizing Models in the Tracker#

The tracker internally uses three models:

  • Re-identification model for higher accuracy tracking as mentioned in VSS CV Pipeline Models

  • SAM2 Segmenter for segmentation as mentioned in VSS CV Pipeline Models. SAM2 internally uses (Memory bank not supported yet):

    • ImageEncoder model

    • MaskDecoder model

All the three default models can be replaced with custom finetuned models. Custom finetuned models can be configured in custom tracker configuration file that is set using CV_PIPELINE_TRACKER_CONFIG environment variable.

To customize the models, follow the steps below:

  1. Mount the custom models: The custom models should be locally available on the host machine in MODEL_ROOT_DIR directory.

  2. Provide fine-tuned ONNX models: Replace the existing models with your own fine-tuned ONNX models.

    This requires updating any of the following fields in custom tracker configuration file set using CV_PIPELINE_TRACKER_CONFIG environment variable:

...
ReID:
  ...
  onnxFile: <MODEL_ROOT_DIR>/<path to your custom reid onnx file>
  ...
Segmenter:
  ...
  ImageEncoder:
    ...
    onnxFile: <MODEL_ROOT_DIR>/<path to your custom image encoder onnx file>
    ...
  MaskDecoder:
    ...
    onnxFile: <MODEL_ROOT_DIR>/<path to your custom mask decoder onnx file>
    ...

Please note that the engine files will be generated internally by the VSS container. Hence, do not change the modelEngineFile field which contains dummy paths.

  1. Provide engine files: Alternatively, provide engine files. If engine files are provided, they will take precedence over ONNX files.

...
ReID:
  ...
  modelEngineFile: <MODEL_ROOT_DIR>/<path to your custom reid engine file>
  ...
Segmenter:
  ...
  ImageEncoder:
    ...
    modelEngineFile: <MODEL_ROOT_DIR>/<path to your custom image encoder engine file>
    ...
  MaskDecoder:
    ...
    modelEngineFile: <MODEL_ROOT_DIR>/<path to your custom mask decoder engine file>
    ...

Customizing CV Pipeline Chunks#

The CV pipeline can be configured to run multiple chunks of video on the same GPU. The default CV pipeline is configured to run maximum 2 chunks of video on the same GPU. This can be customized by setting the NUM_CV_CHUNKS_PER_GPU parameter. Increasing NUM_CV_CHUNKS_PER_GPU will give better performance at the cost of increased GPU utilization and memory consumption. Recommended values are 4 and 6 for increased performance. The parameter NUM_CV_CHUNKS_PER_GPU can be configured in .env file as shown below.

NUM_CV_CHUNKS_PER_GPU=4

Recommendations for CV Pipeline configuration#

We recommend using two sets of configurations for the CV Pipeline:

  • Performance mode: This is the default configuration and is optimized for performance. It uses the following settings :

    • GDINO_INFERENCE_INTERVAL = 1

    • Use a lightweight backbone like Swin-Tiny in grounding DINO detector

    • Tracker configuration: default_tracker_config.yml

    • Models used by tracker : Reidentification model

The metadata generated by CV pipeline will consist of bounding boxes, object IDs, and object types.

  • Accuracy mode: This configuration can be used for higher accuracy detection and tracking. It can be configured using the following settings :

    • GDINO_INFERENCE_INTERVAL = 0

    • Use a high accuracy backbone like SwinB in grounding DINO detector

    • Tracker configuration: config_tracker_NvDCF_accuracy_SAM2.yml

    • Models used by tracker : Reidentification model and SAM2 Segmenter models

The metadata generated by CV pipeline will additionally consist of objects masks along with bounding boxes, object IDs, and object types.

In accuracy mode, object masks are overlaid on the video and provided as input to VLM. To disable mask overlay, change segmenterType: 1 to segmenterType: 0 in config config_tracker_NvDCF_accuracy_SAM2.yml.

Example: Step by Step Guide to Customize the CV Pipeline to High Accuracy Mode#

As mentioned in the last section, VSS CV pipeline defaults to performance mode. In this section, we will provide the steps for customizing the CV pipeline to high accuracy mode.

  1. Clear PVC cache in order to delete stale TensorRT engines using the following command:

sudo microk8s kubectl delete pvc vss-ngc-model-cache-pvc
  1. VLM that gives the best accuracy is GTP-4o. Hence set openai api key secret:

export OPENAI_API_KEY=<your_openai_api_key>
sudo microk8s kubectl create secret generic openai-api-key-secret --from-literal=OPENAI_API_KEY=$OPENAI_API_KEY
  1. Generate the swinb.onnx file:

  2. Download the overrides_cv_accuracy_mode.yaml file and change the mount path in overrides_cv_accuracy_mode.yaml to the host path folder of swinb.onnx at following location:

extraPodVolumes:
  - name: gdino
    hostPath:
      path: <GDINO_MODEL_DIR>   # Path on host
  1. Deploy using Helm with the overrides_cv_accuracy_mode.yaml file:

sudo microk8s helm install vss-blueprint nvidia-blueprint-vss-2.3.0.tgz --set global.ngcImagePullSecretName=ngc-docker-reg-secret -f overrides_cv_accuracy_mode.yaml

You can refer to Deploy Using Helm for more details on deployment status and accessing the VSS UI.

  1. Once all the pods are running, open the VSS UI and follow these steps:

    1. Select the “its” i.e. traffic video

    2. Tick checkbox “enable CV Metadata”

    3. Set the following CV pipeline prompt vehicle . truck;0.3

    4. Set the prompt to: Traffic Camera Video (With “Enable CV Metadata” selected)

    5. Click on the “Summarize” button.

Above 4 steps are shown in the screenshot below:

  1. The summarization process will start in VSS. Once the process completes, VSS UI will be updated with summary as well as the Set-Of-Marks overlay video as shown below. The video will contain the IDs and masks overlaid on the object. Please note that the Set-Of-Marks video contains only the frames sampled for VLM and not all the frames of the input video.

  1. Based on the summary and Set-Of-Marks video, user can ask questions in chat. One sample question is:

Do you see any abnormal events in the video clip? If so, which cars are involved?

You are expected to see a similar answer as shown below: