CV Pipeline Customization#
The VSS CV pipeline consists of three main customizable components: an object detector, a tracker, and cv models used inside tracker. Each component can be customized with a fine-tuned version based on the use-case. The pipeline can also be customized for higher performance by running multiple chunks of video on the same GPU. These customizations can be done before deployment in the following ways :
For Deploy Using Helm update the
overrides.yaml
file as described below in section Customizations in Helm Chart DeploymentFor Deploy Using Docker Compose set the appropriate environment variables in
.env
file as described below in section CV Customizations in Docker Compose Deployment
Customizations in Helm Chart Deployment#
Prerequisites#
The engine files for the models are generated when the VSS container is initialized. These engine files are cached in NGC_MODEL_CACHE
.
If you are configuring a custom model, make sure to delete any stale TensorRT engines in NGC_MODEL_CACHE
before restarting the VSS container.
This can be done by running the following command:
sudo microk8s kubectl delete pvc vss-ngc-model-cache-pvc
Customizing the Detector#
The grounding DINO detector can be customized to use a locally available onnx model. The CV pipeline will generate the engine file for the custom model. It can also be customized to run at a specific frame interval. Here are the steps to customize the grounding DINO detector:
Add the
GDINO_MODEL_PATH
environment variable,extraPodVolumes
andextraPodVolumeMounts
to the overrides file described in Enabling CV Pipeline: Set-Of-Marks (SOM) & Metadata as shown below.
vss: applicationSpecs: vss-deployment: containers: vss: env: - name: GDINO_MODEL_PATH value: "/models/gdino/<your_finetuned_gdino_model>.onnx" extraPodVolumes: - name: gdino hostPath: path: <GDINO_MODEL_DIR> # Path on host extraPodVolumeMounts: - name: gdino mountPath: /models/gdino
The detector can also be customized to run at a specific frame interval by setting
GDINO_INFERENCE_INTERVAL
as shown in Enabling CV Pipeline: Set-Of-Marks (SOM) & Metadata. The value ofGDINO_INFERENCE_INTERVAL
is set to 1 by default i.e. Grounding DINO will run inference for every alternate frame. To get maxumim accuracy,GDINO_INFERENCE_INTERVAL
should be set to 0 i.e. Grounding DINO will run inference for every frame.
Customizing the Tracker#
The default tracker configuration is provided with the VSS blueprint and can be accessed in two ways :
cv_pipeline_tracker_config.yml
innvidia-blueprint-vss/charts/vss/values.yaml
of the VSS BlueprintIncluded inside the VSS container at
/opt/nvidia/via/config/default_tracker_config.yml
The defult tracker configuration is tuned for performance, while object segmentation is disabled. For more details about the tracker configuration parameters please refer to NvMultiObjectTracker Library.
A custom tracker configuration for both higher accuracy tracking and SAM2 object segmentation is also provided as part of the VSS blueprint. This higher accuracy configuration is included inside the VSS container at /opt/nvidia/via/config/config_tracker_NvDCF_accuracy_SAM2.yml
.
To use the higher accuracy tracker configuration, copy the contents of config_tracker_NvDCF_accuracy_SAM2.yml
to the overrides file described in Enabling CV Pipeline: Set-Of-Marks (SOM) & Metadata as shown below.
vss: configs: cv_pipeline_tracker_config.yml: <Paste the contents of config_tracker_NvDCF_accuracy_SAM2.yml here> applicationSpecs: vss-deployment: ...
For finer control, individual parameters can be modified in cv_pipeline_tracker_config.yml
section in the overrides file.
Customizing Models in the Tracker#
The tracker internally uses three models:
Re-identification model for higher accuracy tracking as mentioned in VSS CV Pipeline Models
SAM2 Segmenter for segmentation as mentioned in VSS CV Pipeline Models. SAM2 internally uses (Memory bank not supported yet):
ImageEncoder model
MaskDecoder model
All the three default models can be replaced with custom finetuned models. Custom finetuned models can be configured in cv_pipeline_tracker_config.yml
section in the overrides file as described below.
Mount the custom models: The custom models should be locally available on the host machine. Mount the host machine paths by adding
extraPodVolumes
andextraPodVolumeMounts
to the overrides file as shown below.
vss: extraPodVolumes: - name: tracker-models hostPath: path: <TRACKER_MODELS_DIR> # Path on host extraPodVolumeMounts: - name: tracker-models mountPath: /models/tracker_models applicationSpecs: vss-deployment: ...
2. Provide fine-tuned ONNX models: Replace the existing models with your own fine-tuned ONNX models.
This requires updating any of the following fields in the cv_pipeline_tracker_config.yml
section:
vss:
configs:
cv_pipeline_tracker_config.yml:
...
ReID:
onnxFile: /models/tracker_models/<path to your custom reid onnx file>
...
Segmenter:
...
ImageEncoder:
onnxFile: /models/tracker_models/<path to your custom image encoder onnx file>
...
MaskDecoder:
onnxFile: /models/tracker_models/<path to your custom mask decoder onnx file>
...
Please note that the engine files will be generated internally by the VSS container. Hence, do not change the modelEngineFile
field which contains dummy paths.
Provide engine files: Alternatively, provide engine files. If engine files are provided, they will take precedence over ONNX files.
vss:
configs:
cv_pipeline_tracker_config.yml:
...
ReID:
modelEngineFile: /models/tracker_models/<path to your custom reid engine file>
...
Segmenter:
...
ImageEncoder:
modelEngineFile: /models/tracker_models/<path to your custom image encoder engine file>
...
MaskDecoder:
modelEngineFile: /models/tracker_models/<path to your custom mask decoder engine file>
...
Customizing CV Pipeline Chunks#
The CV pipeline can be configured to run multiple chunks of video on the same GPU. The default CV pipeline is configured to run maximum 2 chunks of video on the same GPU. This can be customized by setting the NUM_CV_CHUNKS_PER_GPU
parameter. Increasing NUM_CV_CHUNKS_PER_GPU
will give better performance at the cost of increased GPU utilization and memory consumption. Recommended values are 4 and 6 for increased performance.
The parameter NUM_CV_CHUNKS_PER_GPU
can be configured in the overrides file as shown below.
vss:
applicationSpecs:
vss-deployment:
containers:
vss:
env:
- name: NUM_CV_CHUNKS_PER_GPU
value: "4"
...
CV Customizations in Docker Compose Deployment#
Customizing the Detector#
The grounding DINO detector can be customized to use a locally available onnx model. The CV pipeline will generate the engine file for the custom model. It can also be customized to run at a specific frame interval. Here are the steps to customize the grounding DINO detector:
Set
GDINO_MODEL_PATH
environment variable to point to the onnx model file. Please note that the model file should be present in theMODEL_ROOT_DIR
directory which is getting mounted in the docker container.
GDINO_MODEL_PATH=<MODEL_ROOT_DIR>/<path_to_your_finetuned_gdino_model>.onnx
The detector can also be customized to run at a specific frame interval by setting
GDINO_INFERENCE_INTERVAL
. The value ofGDINO_INFERENCE_INTERVAL
is set to 1 by default i.e. Grounding DINO will run inference for every alternate frame. To get maxumim accuracy,GDINO_INFERENCE_INTERVAL
should be set to 0 i.e. Grounding DINO will run inference for every frame e.g.
GDINO_INFERENCE_INTERVAL=0
Customizing the Tracker#
The default tracker configuration is provided with the VSS blueprint and can be accessed in VSS container at /opt/nvidia/via/config/default_tracker_config.yml
The defult tracker configuration is tuned for performance. For more details about the tracker configuration parameters please refer to NvMultiObjectTracker Library.
A custom tracker configuration for both higher accuracy tracking and SAM2 object segmentation is also provided as part of the VSS blueprint. This higher accuracy configuration is included inside the VSS container at /opt/nvidia/via/config/config_tracker_NvDCF_accuracy_SAM2.yml
.
To use the higher accuracy tracker configuration, copy the file config_tracker_NvDCF_accuracy_SAM2.yml
in host and set the path in CV_PIPELINE_TRACKER_CONFIG
environment variable.
CV_PIPELINE_TRACKER_CONFIG=<path_to_config_tracker_NvDCF_accuracy_SAM2.yml>
Please note that this is just one example of tracker configuration. You can create your own tracker configuration by referring to the NvMultiObjectTracker Library and modify the parameters as per your requirements.
Customizing Models in the Tracker#
The tracker internally uses three models:
Re-identification model for higher accuracy tracking as mentioned in VSS CV Pipeline Models
SAM2 Segmenter for segmentation as mentioned in VSS CV Pipeline Models. SAM2 internally uses (Memory bank not supported yet):
ImageEncoder model
MaskDecoder model
All the three default models can be replaced with custom finetuned models. Custom finetuned models can be configured in custom tracker configuration file that is set using CV_PIPELINE_TRACKER_CONFIG
environment variable.
To customize the models, follow the steps below:
Mount the custom models: The custom models should be locally available on the host machine in
MODEL_ROOT_DIR
directory.Provide fine-tuned ONNX models: Replace the existing models with your own fine-tuned ONNX models.
This requires updating any of the following fields in custom tracker configuration file set using
CV_PIPELINE_TRACKER_CONFIG
environment variable:
...
ReID:
...
onnxFile: <MODEL_ROOT_DIR>/<path to your custom reid onnx file>
...
Segmenter:
...
ImageEncoder:
...
onnxFile: <MODEL_ROOT_DIR>/<path to your custom image encoder onnx file>
...
MaskDecoder:
...
onnxFile: <MODEL_ROOT_DIR>/<path to your custom mask decoder onnx file>
...
Please note that the engine files will be generated internally by the VSS container. Hence, do not change the modelEngineFile
field which contains dummy paths.
Provide engine files: Alternatively, provide engine files. If engine files are provided, they will take precedence over ONNX files.
...
ReID:
...
modelEngineFile: <MODEL_ROOT_DIR>/<path to your custom reid engine file>
...
Segmenter:
...
ImageEncoder:
...
modelEngineFile: <MODEL_ROOT_DIR>/<path to your custom image encoder engine file>
...
MaskDecoder:
...
modelEngineFile: <MODEL_ROOT_DIR>/<path to your custom mask decoder engine file>
...
Customizing CV Pipeline Chunks#
The CV pipeline can be configured to run multiple chunks of video on the same GPU. The default CV pipeline is configured to run maximum 2 chunks of video on the same GPU. This can be customized by setting the NUM_CV_CHUNKS_PER_GPU
parameter. Increasing NUM_CV_CHUNKS_PER_GPU
will give better performance at the cost of increased GPU utilization and memory consumption. Recommended values are 4 and 6 for increased performance.
The parameter NUM_CV_CHUNKS_PER_GPU
can be configured in .env file as shown below.
NUM_CV_CHUNKS_PER_GPU=4
Recommendations for CV Pipeline configuration#
We recommend using two sets of configurations for the CV Pipeline:
Performance mode: This is the default configuration and is optimized for performance. It uses the following settings :
GDINO_INFERENCE_INTERVAL = 1
Use a lightweight backbone like Swin-Tiny in grounding DINO detector
Tracker configuration:
default_tracker_config.yml
Models used by tracker : Reidentification model
The metadata generated by CV pipeline will consist of bounding boxes, object IDs, and object types.
Accuracy mode: This configuration can be used for higher accuracy detection and tracking. It can be configured using the following settings :
GDINO_INFERENCE_INTERVAL = 0
Use a high accuracy backbone like SwinB in grounding DINO detector
Tracker configuration:
config_tracker_NvDCF_accuracy_SAM2.yml
Models used by tracker : Reidentification model and SAM2 Segmenter models
The metadata generated by CV pipeline will additionally consist of objects masks along with bounding boxes, object IDs, and object types.
In accuracy mode, object masks are overlaid on the video and provided as input to VLM.
To disable mask overlay, change segmenterType: 1
to segmenterType: 0
in config config_tracker_NvDCF_accuracy_SAM2.yml
.
Example: Step by Step Guide to Customize the CV Pipeline to High Accuracy Mode#
As mentioned in the last section, VSS CV pipeline defaults to performance mode. In this section, we will provide the steps for customizing the CV pipeline to high accuracy mode.
Clear PVC cache in order to delete stale TensorRT engines using the following command:
sudo microk8s kubectl delete pvc vss-ngc-model-cache-pvc
VLM that gives the best accuracy is GTP-4o. Hence set openai api key secret:
export OPENAI_API_KEY=<your_openai_api_key>
sudo microk8s kubectl create secret generic openai-api-key-secret --from-literal=OPENAI_API_KEY=$OPENAI_API_KEY
Generate the
swinb.onnx
file:First obtain the PyTorch model from Grounding DINO repo
Then export the model to ONNX format using TAO Toolkit (see Exporting Grounding DINO guide for instructions)
Download the overrides_cv_accuracy_mode.yaml file and change the mount path in
overrides_cv_accuracy_mode.yaml
to the host path folder ofswinb.onnx
at following location:
extraPodVolumes:
- name: gdino
hostPath:
path: <GDINO_MODEL_DIR> # Path on host
Deploy using Helm with the
overrides_cv_accuracy_mode.yaml
file:
sudo microk8s helm install vss-blueprint nvidia-blueprint-vss-2.3.0.tgz --set global.ngcImagePullSecretName=ngc-docker-reg-secret -f overrides_cv_accuracy_mode.yaml
You can refer to Deploy Using Helm for more details on deployment status and accessing the VSS UI.
Once all the pods are running, open the VSS UI and follow these steps:
Select the “its” i.e. traffic video
Tick checkbox “enable CV Metadata”
Set the following CV pipeline prompt
vehicle . truck;0.3
Set the prompt to: Traffic Camera Video (With “Enable CV Metadata” selected)
Click on the “Summarize” button.
Above 4 steps are shown in the screenshot below:
The summarization process will start in VSS. Once the process completes, VSS UI will be updated with summary as well as the Set-Of-Marks overlay video as shown below. The video will contain the IDs and masks overlaid on the object. Please note that the Set-Of-Marks video contains only the frames sampled for VLM and not all the frames of the input video.
Based on the summary and Set-Of-Marks video, user can ask questions in chat. One sample question is:
Do you see any abnormal events in the video clip? If so, which cars are involved?
You are expected to see a similar answer as shown below: