VSS Customization#
There are many ways to customize the VSS blueprint before deployment:
Use Configuration Options to override the default deployment parameters.
Update the the
values.yaml
file of different subcharts of VSS blueprint.Customize the VSS container image with required implementations and use the updated image in Helm Chart.
Following segments explain those approaches in details.
Configurable Parameters#
See Configuration Options for the list of initialization time parameters and their details.
At runtime, the summarize API supports the following parameters.
Refer to the API schema for details. API schema is available at
http://<VSS_API_ENDPOINT>/docs
after VSS is deployed.
temperature
seed
top_p
max_tokens
top_k
chunk_duration
chunk_overlap_duration (File only)
summary_duration (Live stream only)
prompt (VLM prompt)
caption_summarization_prompt and summary_aggregation_prompt (Summarization prompts)
Tuning Prompts#
VLM prompts need to be specific to a use case. The prompt must
include specific events that need to be found out. The summarization prompts used
by CA-RAG also need to be tuned for the use case. The three prompts can be
specified in the /summarize
API.
Warehouse prompt configuration example:
Prompt Type |
Example Prompt |
Guidelines |
---|---|---|
caption |
“Write a concise and clear dense caption for the provided warehouse video, focusing on irregular or hazardous events such as boxes falling, workers not wearing PPE, workers falling, workers taking photographs, workers chitchatting, or forklift stuck. Start and end each sentence with a time stamp.” |
|
caption_summarization |
“You should summarize the following events of a warehouse in the format start_time:end_time:caption. If during a time segment only regular activities happen, then ignore them, else note any irregular activities in detail. The output should be bullet points in the format start_time:end_time: detailed_event_description. Don’t return anything else except the bullet points.” |
|
summary_aggregation |
“You are a warehouse monitoring system. Given the caption in the form start_time:end_time: caption, Aggregate the following captions in the format start_time:end_time:event_description. The output should only contain bullet points. Cluster the output into Unsafe Behavior, Operational Inefficiencies, Potential Equipment Damage, and Unauthorized Personnel.” |
|
caption (prompt for JSON output) |
“Find out all the irregular or hazardous events such as boxes falling, workers not wearing PPE, workers falling, workers taking photographs, workers chitchatting, or forklift stuck. Fill the following JSON format with the event information: { “all_events”: [ {“event”: “<event caption>”, “start_time”: <start time of event>, “end_time”: <end time of the event>}]}. Reply only with JSON output.” |
|
Accessing Milvus Vector DB#
VSS uses Milvus vector DB to store the intermediate VLM responses per chunk before aggregating and summarizing the responses using CA-RAG.
VSS blueprint deploys a Milvus vector DB. You can access the Milvus vector DB by updating the Milvus service to be a NodePort:
kubectl patch svc milvus-milvus-deployment-milvus-service -p '{"spec":{"type":"NodePort"}}'
kubectl get svc milvus-milvus-deployment-milvus-service # Get the Nodeport
milvus-milvus-deployment-milvus-service NodePort 10.152.183.217 <none> 19530:30723/TCP,9091:31482/TCP 96m
Note
If using microk8s, prepend the kubectl
commands with sudo microk8s
. For example, sudo microk8s kubectl ...
.
The Milvus service can be accessed by connecting to <NODE_IP>:30723
in this
case. You can use standard Milvus tools like milvus_cli
or Milvus Python SDK
to interact with the Milvus DB.
VSS stores the per chunk metadata and the per chunk VLM response in the vector DB. The VLM response is stored as a string, because it is and it is not parsed or stored as structured data. The metadata stores include the start and end times of the chunk, and chunk index. The final aggregated summarization response from CA-RAG is not stored.
Custom Post-Processing Functions#
The output of VLM is stored in a Milvus vector DB. To implement custom post-processing functions, you can connect to the Milvus vector DB and use the information stored in it. For details refer to Accessing Milvus Vector DB.
CA-RAG Configuration#
VSS CA-RAG can be configured using a config file.
Here’s an example configuration for Summarization:
summarization:
enable: true
method: "batch"
llm:
model: "meta/llama-3.1-70b-instruct"
base_url: "http://localhost:8000/v1"
max_tokens: 2048
temperature: 0.2
top_p: 0.7
embedding:
model: "nvidia/nv-embedqa-e5-v5"
base_url: "http://localhost:8000/v1"
params:
batch_size: 5
batch_max_concurrency: 20
prompts:
caption: "Write a concise and clear dense caption for the provided warehouse video, focusing on irregular or hazardous events such as boxes falling, workers not wearing PPE, workers falling, workers taking photographs, workers chitchatting, forklift stuck, etc. Start and end each sentence with a time stamp."
caption_summarization: "You should summarize the following events of a warehouse in the format start_time:end_time:caption. For start_time and end_time use . to separate seconds, minutes, hours. If during a time segment only regular activities happen, then ignore them, else note any irregular activities in detail. The output should be bullet points in the format start_time:end_time: detailed_event_description. Don't return anything else except the bullet points."
summary_aggregation: "You are a warehouse monitoring system. Given the caption in the form start_time:end_time: caption, Aggregate the following captions in the format start_time:end_time:event_description. If the event_description is the same as another event_description, aggregate the captions in the format start_time1:end_time1,...,start_timek:end_timek:event_description. If any two adjacent end_time1 and start_time2 is within a few tenths of a second, merge the captions in the format start_time1:end_time2. The output should only contain bullet points. Cluster the output into Unsafe Behavior, Operational Inefficiencies, Potential Equipment Damage and Unauthorized Personnel"
The meaning of the attributes is as follows:
enable
: Enables the summarization. Default: truemethod
: Can bebatch
orrefine
. Refer to Summarization for more details about each method. Default:batch
batch_size
: For methodbatch
, this is the batch size used for combining a batch summary. Default: 5batch_max_concurrency
: For methodbatch
, this is the number of batches that are processed in parallel. Default: 20
Here’s an example configuration for Q&A:
chat:
rag: graph-rag # graph-rag or vector-rag
llm:
model: "gpt-4o"
temperature: 0
embedding:
model: "nvidia/nv-embedqa-e5-v5"
base_url: "http://localhost:8000/v1"
reranker:
model: "nvidia/nv-rerankqa-mistral-4b-v3"
base_url: "http://localhost:8000/v1"
Where attributes are:
rag
: Can begraph-rag
orvector-rag
. Refer to Question and Answer for more details for each option. Defaultgraph-rag
To modify this configuration, update ca_rag_config.yaml
in
nvidia-blueprint-vss/charts/vss/values.yaml
of the VSS Blueprint as
required before deploying the Helm Chart. The endpoints are already configured
to use the models deployed as part of the Helm Chart.
Overview on the steps:
tar -xzf nvidia-blueprint-vss-2.2.0.tgz
.Open the
values.yaml
file in an editor of choice:vi nvidia-blueprint-vss/charts/vss/values.yaml
Find the config file content by searching for “ca_rag_config.yaml”. You will see this under
configs:
section.Change the CA RAG configurations of interest.
Create the Helm Chart tarball with the updated config:
tar -czf nvidia-blueprint-vss-2.2.0.tgz nvidia-blueprint-vss
Deploy the new Helm Chart following instructions at Deploy the Blueprint.
Tuning Guardrails#
VSS supports Guardrails for user input and provides a default Guardrails configuration. VSS uses NVIDIA NeMo Guardrails to provide this functionality.
To modify Guardrails configuration, update guardrails_config.yaml
section in nvidia-blueprint-vss/charts/vss/values.yaml
file of VSS Helm Chart.
Refer to the Nemo Guardrails General instructions to update that section.
Overview on the steps:
tar -xzf nvidia-blueprint-vss-2.2.0.tgz
.Open the
values.yaml
file in an editor of choice:vi nvidia-blueprint-vss/charts/vss/values.yaml
Find the config file content by searching for “guardrails_config.yaml”. You will see this under
configs:
section.Change the Guardrails configurations of interest.
Create the Helm Chart tarball with the updated config:
tar -czf nvidia-blueprint-vss-2.2.0.tgz nvidia-blueprint-vss
Deploy the new Helm Chart following instructions at Deploy the Blueprint.