Real-Time Alert Workflow#

The Real-Time Alert Workflow monitors live video streams and generates alerts when the VLM detects anomalies or specified events.

Capabilities

Use Cases

Traffic collision detection
Unusual behavior detection
Equipment malfunction identification
Safety hazard detection

Estimated Deployment Time: 15-20 minutes

The following diagram illustrates the real-time alert workflow architecture:

Key Features of the Real-Time Alert Agent:

Continuous frame sampling from video streams
Natural language queries for detected alerts
Frame sampling and VLM-based anomaly detection using the RTVI microservice.
Configurable alert prompts and invocation settings for custom detection scenarios.

What’s being deployed#

VSS Agent: Agent service that orchestrates tool calls and model inference to answer questions and generate outputs
VSS Agent UI: Web UI with chat, video upload, and different views
RTVI VLM: Real-Time VLM microservice for alert verification
Video IO & Storage (VIOS): Video ingestion, recording, and playback services used by the agent for video access and management
NVStreamer: Video streaming service for video playback
Nemotron LLM (NIM): LLM inference service used for reasoning, tool selection, and response generation
Cosmos Reason (NIM): Vision-language model with physical reasoning capabilities
ELK: Elasticsearch, Logstash, and Kibana stack for log storage and analysis
Phoenix: Observability and telemetry service for agent workflow monitoring

Prerequisites#

Before you begin, ensure all of the prerequisites are met. See Prerequisites for more details.

Deploy#

Note

For instructions on downloading sample data and the deployment package, see Download Sample Data and Deployment Package in the Quickstart guide.

Skip to Step 1: Deploy the Agent if you have already downloaded and deployed another agent workflow.

Step 1: Deploy the Agent#

Based on your GPU, run the following command to deploy the agent:

H100

deployments/dev-profile.sh up -p alerts -m 2d_vlm

RTX PRO BW 6000

deployments/dev-profile.sh up -p alerts -m 2d_vlm -H RTX6000PROBW

L40S

deployments/dev-profile.sh up -p alerts -m 2d_vlm -H L40S

This deployment uses the following defaults:

Host IP: Primary IP from ip route
Alert mode: 2d_vlm (required for this workflow)
LLM mode: local_shared
VLM mode: local_shared
LLM model: nvidia-nemotron-nano-9b-v2
VLM model: cosmos-reason2-8b

This command will download the necessary containers from the NGC Docker registry and start the agent. Depending on your network speed, this may take a few minutes.

Note

NGC API Key: The deployment requires an NGC CLI API key. You can either set it as an environment variable (export NGC_CLI_API_KEY='your_ngc_api_key') or pass it as a command-line argument using -k 'your_ngc_api_key'.

Note

For advanced deployment options such as hardware profiles, NIM configurations, and model selection, see Advanced Deployment Options.

Once complete, check that all the containers are running and healthy:

docker ps

Once all the containers are running, you can access the agent UI at http://<HOST_IP>:3000/.

Step 2: Add a video stream#

Add an RTSP stream by clicking the “+ Add RTSP” button under the “Video Management tab” on the agent UI. If you do not have an RTSP stream, you can use NVStreamer at http://<HOST_IP>:31000 to upload a video file and create an RTSP stream.

For this profile, use the warehouse_sample.mp4 stream. When prompting, ensure the sensor name matches exactly what you configured for the stream (e.g., warehouse_sample).

Note

By default, this profile only supports up to one stream being processed at a time.

Step 3: Start a real-time alert#

Launch the Agent UI at http://<HOST_IP>:3000/.

Use the Chat Tab to interact with the system:

Start an RTVI real-time alert for a stream by specifying the alert type.
- Sample prompt: Start real-time alert for boxes dropped on sensor warehouse_sample
To view the reasoning trace for alert detection, click on the “Trace” icon in the alert details. This shows the VLM’s analysis process and decision-making steps.
Stop an RTVI real-time alert for a stream when monitoring is no longer needed.
- Sample prompt: Stop real-time alert on sensor warehouse_sample
List alert incidents for a stream to review detected anomalies.
- Sample prompt: Show me the 5 most recent incidents from warehouse_sample as a table

Click on the “Alerts” tab on the left hand side and enable “VLM Verified” to view the verified alerts for all active sensors.

Step 4: Teardown the Agent#

To teardown the agent, run the following command:

deployments/dev-profile.sh down

This command will stop and remove the agent containers.

Service Endpoints#

Once deployed, the following services are available:

Service Endpoints#
Service	URL
VSS UI	`http://<HOST_IP>:3000`
NVStreamer UI	`http://<HOST_IP>:31000/#/dashboard`
VST UI	`http://<HOST_IP>:30888/vst/#/dashboard`
Phoenix UI	`http://<HOST_IP>:6006/projects`

Appendix: Advanced Deployment Options#

The deployment script supports additional configuration options for advanced use cases. To view all available options, run:

deployments/dev-profile.sh --help

Hardware Profile#

Specify the hardware profile to optimize for your GPU:

deployments/dev-profile.sh up -p alerts -m 2d_vlm -H RTX6000PROBW

Available hardware profiles: H100 (default), L40S, RTX6000PROBW

Host IP Configuration#

Manually specify the host IP address:

deployments/dev-profile.sh up -p alerts -m 2d_vlm -i '<HOST_IP>'

Default: Primary IP from ip route

Externally Accessible IP#

Optionally specify an externally accessible IP address for services that need to be reached from outside the host:

deployments/dev-profile.sh up -p alerts -m 2d_vlm -e '<EXTERNALLY_ACCESSIBLE_IP>'

LLM and VLM Configuration#

Configure the LLM and VLM (NVIDIA Inference Microservice) modes independently:

deployments/dev-profile.sh up -p alerts -m 2d_vlm \
  --llm-mode local \
  --vlm-mode local

Available modes:

local_shared (default): Shared local NIM instance
local: Dedicated local NIM instance
remote: Use remote NIM endpoints

Constraint: Both --llm-mode and --vlm-mode must be local_shared, or neither can be local_shared.

For remote LLM and VLM, specify the base URLs:

deployments/dev-profile.sh up -p alerts -m 2d_vlm \
  --llm-mode remote \
  --vlm-mode remote \
  --llm-base-url https://your-llm-endpoint.com \
  --vlm-base-url https://your-vlm-endpoint.com

Note

To deploy your own remote NIM endpoint, refer to the NVIDIA NIM Deployment Guide for instructions on setting up NIM on your infrastructure.

Model Selection#

Specify custom LLM and VLM models:

deployments/dev-profile.sh up -p alerts -m 2d_vlm \
  --llm llama-3.3-nemotron-super-49b-v1.5 \
  --vlm cosmos-reason2-8b

Available LLM models: nvidia-nemotron-nano-9b-v2, nemotron-3-nano, llama-3.3-nemotron-super-49b-v1.5, gpt-oss-20b

Available VLM models: cosmos-reason1-7b, cosmos-reason2-8b, qwen3-vl-8b-instruct

Note

Only the default models nvidia-nemotron-nano-9b-v2 (LLM) and cosmos-reason2-8b (VLM) have been verified on local and local_shared NIM modes.

Device Assignment#

Assign specific GPU devices for LLM and VLM:

deployments/dev-profile.sh up -p alerts -m 2d_vlm \
  --llm-device-id 0 \
  --vlm-device-id 1

Note: --llm-device-id is not allowed if --llm-mode is remote. --vlm-device-id is not allowed if --vlm-mode is local_shared or remote.

Custom VLM Weights#

The VSS Blueprint supports using custom VLM weights to improve anomaly detection accuracy for specific scenarios or domains.

Download Custom Weights

Before using custom weights, you need to download them from NGC or Hugging Face. For detailed instructions on downloading custom weights, see the Custom VLM Weights section in Prerequisites.

Deploy with Custom Weights

Once you have downloaded custom weights to a local directory, specify the path when deploying:

deployments/dev-profile.sh up -p alerts -m 2d_vlm \
  --vlm-custom-weights /path/to/custom/weights

Skip Custom Weights

To deploy without custom weights and use the default model weights (no download, no environment variable set):

deployments/dev-profile.sh up -p alerts -m 2d_vlm \
  --vlm-custom-weights None

Dry Run#

To preview the deployment commands without executing them:

deployments/dev-profile.sh up -p alerts -m 2d_vlm -d

Note: The -d or --dry-run flag is also available for the down command.

Real-Time Alert Workflow#

What’s being deployed#

Prerequisites#

Deploy#

Step 1: Deploy the Agent#

Step 2: Add a video stream#

Step 3: Start a real-time alert#

Step 4: Teardown the Agent#

Service Endpoints#

Appendix: Advanced Deployment Options#

Hardware Profile#

Host IP Configuration#

Externally Accessible IP#

LLM and VLM Configuration#

Model Selection#

Device Assignment#

Custom VLM Weights#

Dry Run#

Known Issues#