Alert Verification Microservice#

The Alert Verification Microservice is a modular, configuration-driven service for ingesting alerts generated by video analytics pipelines, verifying them using Vision Language Models (VLMs), and publishing the results to downstream systems.

The service supports alerts in nvschema format (nv.Incident and nv.Behavior) via multiple ingestion methods including Kafka or HTTP API. Upon receiving an alert, the Alert Verification Microservice retrieves the video URL for the alert time window from VIOS based on sensor ID and timestamps, then passes this URL to a VLM-based verification backend (such as NVIDIA Cosmos Reason1 NIM) through an OpenAI-compatible API. The VLM downloads the video and analyzes it against configurable prompts. Alerts are typically generated by video analytics based on events detected within a camera’s field of view.

Primary Workflow

The Alert Verification Microservice follows a structured 4-step workflow:

Alert Ingestion: Alerts are received via HTTP API or Kafka topics. The system validates the message schema, applies normalization, and queues alerts for processing.
Video URL Retrieval: The service extracts sensor ID, start time, and end time from the alert, then retrieves the corresponding video URL from VIOS using REST API.
VLM Analysis: The video URL and a dynamically constructed prompt (based on alert type and context) are sent to the VLM backend. The VLM downloads the video, analyzes it, and returns a verdict with reasoning.
Result Publishing: Verified alerts are persisted to Elasticsearch and optionally published to Kafka (enabled via configuration). Results include the original alert fields plus verification metadata.

Key Features#

Event-Driven Processing: Consumes alerts from Kafka with configurable consumer groups and offset management.
HTTP API Support: RESTful endpoints for direct alert submission (POST /api/v1/alerts for behaviors, POST /api/v1/incidents for incidents).
nvschema Compatibility: Native support for nv.Incident and nv.Behavior message formats in both Protobuf and JSON.
Automatic Video Resolution: Retrieves video segments from VIOS based on alert timestamps and sensor information.
Configurable Prompts: Alert-type-specific prompts with template placeholders for dynamic content injection from alert payloads.
Configurable VLM Parameters: Support to Tune video preprocessing and inference settings to match deployment requirements and VLM backend capabilities.
Structured VLM Responses: Standardized response format with verdict (confirmed/rejected/unverified), reasoning trace, and HTTP-like status codes.
Flexible Output Options: Persist results to Elasticsearch for analytics, with optional Kafka publishing.
Custom Category Names: Configure user-friendly display names for alert categories in output (e.g., "collision" displays as "Vehicle Collision" in Elasticsearch and UI).
Alert Enrichment: Optionally generate detailed event descriptions after successful verification, providing additional context such as objects involved, sequence of events, and environmental factors.

Setup#

The Alert Verification Microservice supports the following deployment options:

Docker Compose deployment
Kubernetes deployment

Note

For detailed deployment instructions, refer to the Docker Compose Deployment and Kubernetes Deployment sections.

Prerequisites

Before deploying the Alert Verification Microservice, ensure the following requirements are met:

A valid NGC API key is required for accessing NVIDIA services.
A VLM backend (e.g., NVIDIA Cosmos Reason1 NIM) must be deployed and accessible.
VIOS service must be available for video retrieval.
Kafka is required if using event-driven ingestion.
Elasticsearch (optional) for result persistence and analytics.

Configuring Prompts#

Prompts control how the VLM analyzes video content for each alert type. They are configured via a JSON file (alert_type_config.json) that is loaded at startup and stored in RedisJSON for runtime access.

Note

Prompt configuration changes require a container restart in the current release. API-based prompt management is not supported.

Prompt Structure#

Each alert type requires a configuration with the following components:

user (mandatory): The primary instruction describing how to analyze the video and what decision to make.
system (optional): Contextual setup that shapes the VLM’s behavior, tone, and domain knowledge.
enrichment (optional): A prompt to generate detailed event descriptions after successful verification. When defined, a second VLM call provides additional context such as objects involved, sequence of events, and environmental factors.
output_category (optional): A user-friendly display name for the alert category in output. When specified, this name appears in Elasticsearch and downstream systems instead of the internal alert_type value (e.g., "collision" displays as "Vehicle Collision").

Prompts are stored in RedisJSON at startup and selected based on the alert type during processing.

Prompt Templating#

Prompts support dynamic placeholders that are substituted with values from the alert payload at runtime:

Placeholders use {field.path} syntax with dot notation for nested fields.
Examples: {info.primaryObjectId}, {place.name}, {objectIds}
Missing fields render as <missing:field.path> for debugging.
Array values are automatically joined as comma-separated strings.

Alert Type Matching#

The category field in the incoming alert is matched against the alert_type in the prompt configuration. When a match is found, the corresponding prompts are selected and rendered with alert context.

VLM Response Format#

The VLM must respond using the following format for proper parsing:

<think>
reasoning trace here
</think>
<answer>
verdict (A/B or true/false)
</answer>

The answer section should contain a clear verdict that the system can map to confirmed/rejected status.

Example Configuration#

Below is an example prompt configuration from alert_type_config.json:

{
  "version": "1.0",
  "alerts": [
    {
      "alert_type": "collision",
      "output_category": "Vehicle Collision",
      "prompts": {
        "system": "You are an expert AI assistant for video analysis. Your task is to determine whether a surveillance video depicts a **collision event** or **no collision**, based on the definitions below...",
        "user": "Based on the video, which category best describes what occurred at {place.name}:\n(A) Collision (physical contact or impact detected)\n(B) No collision (no contact or impact)...",
        "enrichment": "Provide a detailed description of this collision event. Include: 1) Objects/vehicles involved and their identifiers 2) Sequence of events leading to the collision 3) Estimated speeds and trajectories 4) Visible damage or safety concerns 5) Environmental factors such as lighting, weather, or obstacles"
      }
    },
    {
      "alert_type": "Stop Anomaly Module",
      "output_category": "Abnormal Vehicle Stop",
      "prompts": {
        "system": "You are an expert AI assistant for video analysis. Your task is to determine whether a surveillance video depicts **normal stopping behavior** or a **stop anomaly**, based on the strict definitions below...",
        "user": "Based on the video, which category best describes the observed behavior at {place.name} (if available):\n(A) Stop anomaly (unexpected or unsafe stop)\n(B) Normal stop (safe and expected behavior)..."
      }
    }
  ]
}

In this example:

The collision alert type uses output_category to display as "Vehicle Collision" in output, and includes an enrichment prompt for detailed event descriptions.
The Stop Anomaly Module uses output_category but no enrichment prompt (enrichment is optional per alert type).

Alert Messages and Schema Format#

The Alert Verification Microservice supports nvschema message formats for both alerts and incidents:

nv.Incident: For incident-type events requiring verification.
nv.Behavior: For behavior/alert-type events requiring verification.

Both Protobuf and JSON formats are supported for message serialization.

Note

For the complete schema definitions of nv.Incident and nv.Behavior messages, refer to the Protobuf Schema documentation.

Alert Ingestion Endpoints#

HTTP API

POST /api/v1/alerts: Submit alerts in nv.Behavior format (JSON or Protobuf).
POST /api/v1/incidents: Submit incidents in nv.Incident format (JSON or Protobuf).

Responses: 202 Accepted (queued), 422 (validation error), 415 (unsupported media type), 500 (internal error).

Kafka Topics (configurable)

Input: mdx-incidents (nv.Incident), mdx-alerts (nv.Behavior)
Output: mdx-vlm-incidents, mdx-vlm-alerts (when Kafka sink enabled)

Key Fields#

The following table describes key fields in alert messages:

Field	Description
`sensorId`	Camera or sensor identifier used for video retrieval.
`timestamp`	Event start time in ISO 8601 format.
`end`	Event end time in ISO 8601 format.
`objectIds`	List of tracked object IDs involved in the event.
`category`	Alert type used for prompt matching (e.g., “collision”, “Stop Anomaly Module”).
`place.name`	Location identifier (e.g., intersection name, zone).
`info`	Additional metadata; extended with VLM verification results in responses.

Verification Response#

Verified alerts include the original fields plus an extended info section containing:

Field	Description
`verification_response_code`	HTTP-like numeric code: 200 = success, 4xx = client errors, 5xx = server/VLM errors.
`verification_response_status`	Error description when non-200 status.
`verdict`	One of: `"confirmed"` (alert verified true), `"rejected"` (alert verified false), `"unverified"` (verification incomplete due to error).
`reasoning`	VLM reasoning trace extracted from the `<think>` section.

Example Request#

{
  "sensorId": "Lafayette_Agnew",
  "timestamp": "2025-09-11T00:08:27.822Z",
  "end": "2025-09-11T00:09:22.122Z",
  "objectIds": [
    "958741182",
    "958750871",
    "958834290",
    "958730631"
  ],
  "place": {
    "name": "city=Montague/intersection=Lafayette_Agnew"
  },
  "analyticsModule": {
    "id": "Collision Detection Module",
    "description": "Potential collision detected between 4 vehicles"
  },
  "category": "collision",
  "isAnomaly": true,
  "info": {
    "location": "42.48837572978232,-90.73894264480816,0.0",
    "primaryObjectId": "958750871"
  }
}

Example Response#

{
  "sensorId": "Lafayette_Agnew",
  "timestamp": "2025-09-11T00:08:27.822Z",
  "end": "2025-09-11T00:09:22.122Z",
  "objectIds": [
    "958741182",
    "958750871",
    "958834290",
    "958730631"
  ],
  "place": {
    "name": "city=Montague/intersection=Lafayette_Agnew"
  },
  "analyticsModule": {
    "id": "Collision Detection Module",
    "description": "Potential collision detected between 4 vehicles"
  },
  "category": "collision",
  "isAnomaly": true,
  "info": {
    "location": "42.48837572978232,-90.73894264480816,0.0",
    "primaryObjectId": "958750871",
    "verification_response_code": "200",
    "verification_response_status": "OK",
    "reasoning": "The video shows vehicle 958750871 approaching the intersection...",
    "verdict": "confirmed"
  }
}

Scaling and Performance Tuning#

The Alert Verification Microservice supports concurrent processing to maximize throughput. This section describes the key configuration parameters for scaling the service to meet your deployment requirements.

Concurrency Model#

The Alert Verification Microservice uses a main thread with a fixed-size worker pool for concurrent processing:

Main Thread: Polls alerts from Kafka (or receives via HTTP API) and dispatches them to available workers.
Worker Threads: Each worker processes an assigned alert through the full pipeline:
- Retrieves video URL from VIOS for the alert time window
- Renders the prompt with alert context
- Passes video URL, prompts and other required inputs to VLM backend for verification
- Publishes the verified result to configured sinks

Workers operate independently, allowing multiple alerts to be processed in parallel. The main thread blocks when all workers are busy, providing natural backpressure. The end-to-end latency of external services (VLM, VIOS) directly bounds throughput per worker.

Key Scaling Parameters#

The following parameters control concurrency and throughput:

Parameter	Default	Impact
`alert_agent.num_workers`	`1`	Number of concurrent worker threads. Increasing this value scales alert processing throughput roughly linearly until constrained by CPU/memory or downstream service limits (VLM, VIOS).
`alert_agent.chunk_size`	`1`	Number of alerts processed per batch within a worker. Higher values reduce overhead but increase memory usage.
`kafka.max_poll_records`	`10`	Maximum records fetched per Kafka poll. Controls batch size for ingestion; higher values improve throughput but increase memory and latency variance.
`kafka.poll_timeout`	`100`	Kafka consumer poll wait timeout (ms). Lower values reduce latency for sparse traffic; higher values reduce CPU usage.
`kafka.max_poll_interval_ms`	`60000`	Maximum interval between polls before consumer is considered failed. Must exceed worst-case VLM processing time multiplied by batch size.

Scaling Guidelines#

Vertical Scaling (Single Instance)

Increase ``num_workers``: Start with the number of CPU cores available. Monitor CPU utilization and increase until VLM backend becomes the bottleneck.
Tune ``chunk_size``: For high-volume deployments, increase to 2-4 to reduce per-message overhead. Keep at 1 for low-latency requirements.
Adjust ``max_poll_records``: Match to num_workers × chunk_size for optimal batching. Avoid setting too high to prevent memory pressure.

Horizontal Scaling (Multiple Instances)

Kafka Consumer Groups: Multiple Alert Verification Microservice instances can share the same event_bridge.kafka_source.group_id to distribute load across partitions.
Partition Alignment: Ensure Kafka topic partitions >= number of consumer instances for effective load distribution.
Stateless Design: The Alert Verification Microservice is stateless; scale horizontally by adding replicas behind Kubernetes or Docker instances.

Performance Tuning Example#

For a deployment targeting 10 alerts/second with average VLM latency of 0.5 seconds:

alert_agent:
  num_workers: 5         # 10 alerts/sec × 0.5 sec latency = 5 concurrent
  chunk_size: 1          # Process one alert at a time for consistent latency

kafka:
  max_poll_records: 5    # Match worker count
  poll_timeout: 100      # Low latency polling
  max_poll_interval_ms: 30000   # 30 seconds sufficient for fast VLM

Note

Monitor the following metrics to identify bottlenecks:

Worker utilization (all workers busy = scale up)
VLM request latency and error rates
Kafka consumer lag (growing lag = scale up or optimize)
Memory and CPU utilization

Configuration#

The Alert Verification Microservice is configured via a YAML-based configuration file. The following tables describe key configuration parameters organized by category.

VIOS Configuration#

Parameter	Default	Description
`vst_config.base_url`	`http://localhost:30888`	Base URL for VIOS APIs.
`vst_config.sensor_list_endpoint`	`/vst/api/v1/sensor/streams`	Endpoint for retrieving sensor/stream list.
`vst_config.segment_anchor`	`end`	Segment anchor mode (`end` for end-anchored window).
`vst_config.segment_duration_seconds`	`10`	Clip segment duration in seconds.
`vst_config.add_overlay`	`false`	Enable bounding box overlay on video segments.

Kafka Configuration#

Parameter	Default	Description
`kafka.bootstrap_servers`	`localhost:9092`	Kafka broker addresses.
`kafka.group_id`	`kafka-incidents-dumper`	Default Kafka consumer group ID.
`kafka.auto_offset_reset`	`latest`	Offset reset policy (earliest/latest).
`kafka.enable_auto_commit`	`false`	Enable automatic offset commit.
`kafka.max_poll_records`	`10`	Maximum records per poll.
`kafka.max_poll_interval_ms`	`60000`	Maximum interval between polls (ms).
`kafka.session_timeout_ms`	`10000`	Session timeout (ms).
`kafka.heartbeat_interval_ms`	`3000`	Heartbeat interval (ms).
`kafka.poll_timeout`	`100`	Kafka consumer poll wait timeout (ms).
`kafka.message_type`	`Incident`	Protobuf message type (`Incident` or `Behavior`).

Event Bridge Configuration#

Parameter	Default	Description
`event_bridge.sourceType`	`kafka`	Source type (`kafka`).
`event_bridge.sinkType`	`kafka`	Sink type (`kafka`).
`event_bridge.kafka_source.group_id`	`alert-bridge-vlm-group`	Kafka source consumer group.
`event_bridge.kafka_source.topics.incident`	`mdx-incidents`	Kafka topic for incidents.
`event_bridge.kafka_source.topics.alert`	`mdx-alerts`	Kafka topic for alerts.

VLM Configuration#

Note

For detailed VLM API documentation, refer to the Cosmos Reason1 NIM documentation: TBD

Parameter	Default	Description
`vlm.base_url`	(required)	OpenAI-compatible VLM endpoint URL.
`vlm.model`	`nvidia/cosmos-reason1-7b`	VLM model name.
`vlm.max_tokens`	`4096`	Maximum tokens for VLM responses.
`vlm.min_pixels`	`1568`	Minimum pixel budget for video frames.
`vlm.max_pixels`	`345600`	Maximum pixel budget for video frames.
`vlm.num_frames`	`5`	Number of video frames to sample.
`vlm.enable_sampling`	`false`	Enable frame sampling mode.
`vlm.sampling_fps`	`4`	Sampling FPS when enabled.

Note

The min_pixels and max_pixels parameters must be set in accordance with the VLM’s maximum context window.

Alert Verification Microservice Configuration#

Parameter	Default	Description
`alert_agent.num_workers`	`10`	Number of worker threads for concurrent processing.
`alert_agent.max_allowed_stream_size`	`2`	Maximum stream size in minutes.
`alert_agent.default_stream_interval`	`1`	Default stream interval in minutes.
`alert_agent.vst_pass_through_mode`	`false`	Skip VIOS lookup; use local media files directly.
`alert_agent.chunk_size`	`1`	Chunk size for processing.
`alert_agent.enrichment.enabled`	`false`	Enable alert enrichment for generating detailed event descriptions after verification.

Elasticsearch Configuration#

Parameter	Default	Description
`elastic.enabled`	`true`	Enable Elasticsearch persistence.
`elastic.hosts`	`[http://localhost:9200]`	Elasticsearch host URLs.
`vlm_enhanced_sink.incident.elastic.index`	`mdx-vlm-incidents`	Elasticsearch index for verified incidents.
`vlm_enhanced_sink.alert.elastic.index`	`mdx-vlm-alerts`	Elasticsearch index for verified alerts.

Prompt Configuration#

Parameter	Default	Description
`alert_type_config_file`	`alert_type_config.json`	Path to alert type configuration file.
`prompt.prefer_payload_prompt`	`false`	Prefer prompts from alert payload over stored prompts.
`prompt.override_prompts_on_start`	`true`	Override stored prompts with config file on startup.

Logging Configuration#

Parameter	Default	Description
`logging.level`	`DEBUG`	Application log level (DEBUG, INFO, WARNING, ERROR, CRITICAL).
`logging.format`	(see config)	Log message format string.
`logging.third_party_level`	`WARNING`	Log level for third-party libraries.

API Reference

Alert Verification API