Alert Verification Workflow#

Two approaches for leveraging VLMs to generate alerts are showcased as part of the agent worklows:

Alert Verification: The VLM analyzes video snippets corresponding to alerts generated upstream for verification; original alerts are generated through combination of object detection/tracking and behavior analytics microservices that process video streams in realtime. This approach invokes the VLM more sporadically and hence has lower GPU requirements , but depends on an upstream entity to generate “candidate” alerts for verification.
Real-Time alerts: The VLM continuously processes segments from a video source (eg: camera) at periodic intervals based on a user defined chunk duration . This approach leverages the generalizability of VLMs to trigger alerts for a broad set of cases (VLM fine tuning/prompt tuning may be needed). . However, it has higher GPU requirements due to more frequent VLM usage.

This section addresses the Alert Verification workflow; see the next section for Real-Time Alerts.

Use Cases for Alert Verification

Estimated Deployment Time: 15-20 minutes

The following diagram illustrates the alert verification workflow architecture:

Key Features of the Alert Verification Agent:

RTVI CV for real-time object detection using Grounding DINO (open-vocabulary detection)
Behavior Analaytics for rule-based and configurable alert generation from detection results
Alert Verification for VLM-based alert clip review to reduce false positives
Alert storage for querying and reporting
Report Generation

What’s being deployed#

NVStreamer: Video streaming service for dataset video playback, thereby replicating live cameras in a real world deployment
Video IO & Storage (VIOS): Video ingestion (of NVStreamer video streams) supporting live streaming, recording, and playback features used by the agent
RTVI CV: Real-Time Video Intelligence CV Microservice for object detection that processes VIOS live streams to output metadata to Kafka
Behavior Analaytics: Processes metadata from RTVI CV to generate alerts
Alert Verification: Verification of alert video using VLM
Cosmos Reason (NIM): Vision-language model with physical reasoning capabilities used by Alert Verification
ELK: Elasticsearch, Logstash, and Kibana stack for log storage and analysis
VSS Agent: Agent service that orchestrates tool calls and model inference to answer questions and generate outputs
Nemotron LLM (NIM): LLM inference service used for reasoning, tool selection, and response generation
Phoenix: Observability and telemetry service for agent workflow monitoring

Before you begin, ensure all of the prerequisites are met. See Prerequisites for more details.

Once deployed, the following services are available:

Once you’ve familiarized yourself with the alert verification workflow, you can explore:

Modifying the alert prompt in the Alert Verification Microservice configuration.
Adjusting rate limit settings to control alert verification frequency.
Configuring G-DINO prompting and class thresholds for custom detection scenarios.

Some CR2 VLM inaccuracies might be observed with the public model.
Video snippets generated for alerts may be short (eg: only couple of seconds) depending on behavior analytics processing of the specific video, which could impacting VLM accuracy. To address this issue, modify the fovCountViolationIncidentThreshold setting to desired minimal alert clip duration within this file deployments/developer-workflow/dev-profile-alerts/vss-behavior-analytics/configs/vss-behavior-analytics-kafka-config.json
Video playback duration for verified alerts may not exactly match the alert timestamps.
Report generation may not work correctly with this profile.
In the event perception crashes and restarts, streams are not automatically readded and alerts will not be generated.
For remote VLM and LLM deployments, the alert verification timeout may need to be increased from the default value of 5 seconds. See Alert Verification VLM Configuration Options for specific details.