Alert Verification Workflow#

Two approaches for leveraging VLMs to generate alerts are showcased as part of the agent workflows:

Alert Verification: The VLM analyzes video snippets corresponding to alerts generated upstream for verification; original alerts are generated through a combination of object detection/tracking and behavior analytics microservices that process video streams in real time. This approach invokes the VLM more sporadically and hence has lower GPU requirements, but depends on an upstream entity to generate “candidate” alerts for verification.
Real-Time Alerts: The VLM continuously processes segments from a video source (for example, a camera) at periodic intervals based on a user-defined chunk duration. This approach leverages the generalizability of VLMs to trigger alerts for a broad set of cases (VLM fine-tuning or prompt tuning may be needed). However, it has higher GPU requirements due to more frequent VLM usage.

This section addresses the Alert Verification workflow; see Real-Time Alerts for the other approach.

Use Cases for Alert Verification

PPE compliance verification (hard hats, safety vests)
Restricted area monitoring
Asset presence/absence detection
Custom object detection scenarios

Estimated Deployment Time: 15-20 minutes

The following diagram illustrates the alert verification workflow architecture:

Key Features of the Alert Verification Agent:

RTVI CV for real-time object detection using Grounding DINO (open-vocabulary detection)
Behavior Analytics for rule-based and configurable alert generation from detection results
Alert Verification for VLM-based alert clip review to reduce false positives
Alert storage for querying and reporting
Report Generation

What’s being deployed#

NVStreamer: Video streaming service for dataset video playback, thereby replicating live cameras in a real-world deployment
Video IO & Storage (VIOS): Video ingestion (of NVStreamer video streams) supporting live streaming, recording, and playback features used by the agent
RTVI CV: Real-Time Video Intelligence CV Microservice for object detection that processes VIOS live streams to output metadata to Kafka
Behavior Analytics: Processes metadata from RTVI CV to generate alerts
Alert Verification: Verification of alert video using VLM
RTVI VLM: Real-Time VLM Microservice for vision-language model inference used by Alert Verification
ELK: Elasticsearch, Logstash, and Kibana stack for log storage and analysis
VSS Agent: Agent service that uses a configured LLM endpoint to route requests and orchestrate tool calls to VSS microservices and model endpoints (LLM/VLM NIMs) to answer questions and generate outputs
Nemotron LLM (NIM): LLM inference service used for reasoning, tool selection, and response generation
Phoenix: Observability and telemetry service for agent workflow monitoring

Prerequisites#

Before you begin, ensure all of the prerequisites are met. See Prerequisites for more details.

Note

For instructions on downloading sample data and the deployment package, see Download Sample Data and Deployment Package in the Quickstart guide.

If you have already completed those steps for another agent workflow, skip to Step 1: Deploy the Agent in the Deploy section below.

Service Endpoints#

Once deployed, the following services are available:

Service Endpoints#
Service	URL
VSS UI	`http://<HOST_IP>:7777`
Kibana UI	`http://<HOST_IP>:7777/kibana/app/home#/`
NVStreamer UI	`http://<HOST_IP>:31000/#/dashboard`
VST UI	`http://<HOST_IP>:30888/vst/#/dashboard`
Phoenix UI	`http://<HOST_IP>:7777/phoenix/projects`

Next Steps#

Once you’ve familiarized yourself with the alert verification workflow, you can explore:

Modifying the alert prompt in the Alerts Microservice configuration.
Adjusting rate limit settings to control alert verification frequency.
Configuring G-DINO prompting and class thresholds for custom detection scenarios.

Known Issues#

Some VLM inaccuracies might be observed depending on the model and configuration used with the RTVI VLM container.
Video snippets generated for alerts may be short (for example, only a couple of seconds) depending on behavior analytics processing of the specific video, which could impact VLM accuracy. To address this issue, modify the fovCountViolationIncidentThreshold setting to the desired minimum alert clip duration in deploy/docker/developer-profiles/dev-profile-alerts/vss-behavior-analytics/configs/vss-behavior-analytics-config.json.
Video playback duration for verified alerts may not exactly match the alert timestamps.
Report generation may produce inaccurate results. As a potential workaround, remove the use_base64: true line under video_understanding_iso in developer-profiles/dev-profile-alerts/vss-agent/configs/config.yml.
If perception crashes and restarts, streams are not automatically re-added and alerts will not be generated.
For remote VLM and LLM deployments, the alert verification timeout may need to be increased from the default value of 5 seconds. See Alert Verification VLM Configuration Options for specific details.
VLM verdict verification can fail in the UI even when VLM requests are processed successfully. Alert Bridge response parsing accepts only raw verdicts such as A/B or Yes/No before serializing them to confirmed or rejected. As a result, a semantically valid rejected verdict from Cosmos Reason can fail schema validation and be persisted as verification-failed.
The VST UI is externally accessible on both port 30000 and 30888 because both ports are exposed on the host network. For security hardening, consider using a firewall to allowlist only the required VST ingress port.

Alert Verification Workflow#

What’s being deployed#

Prerequisites#

Deploy#

Step 1: Deploy the Agent#

Deploy with Agent Skills#

Step 2: Add a video stream#

Step 3: Verify pipeline components#

Step 4: View alerts in the Agent UI#

Step 5: Generate a Report for the Alert#

Step 6: Teardown the Agent#

Service Endpoints#

Next Steps#

Known Issues#