Alerts Microservice#
The Alerts Microservice is a modular, agent-native, configuration-driven service for alert-related workflows, including:
Alert verification, contextualization, and classification: Process curated (short) clips from the VSS video pipeline (for example, RTVI CV and video analytics) through a VLM to confirm, enrich, or classify upstream alerts.
Real-time alerts: Register video streams for continuous alert generation via periodic VLM inference through the RTVI VLM microservice.
Alert-on-demand: Process short clips on demand through a VLM using preconfigured alert settings.
Key Features#
Event-Driven Processing: Consumes alerts from Kafka in
nv.Incidentandnv.Behaviormessage formats in both Protobuf and JSONHTTP API Support: RESTful endpoints for various functionalities (
POST /api/v1/realtime,POST /api/v1/verification/config,POST /api/v1/verification/ondemand,POST /api/v1/incidents,POST /api/v1/alerts).Agent Skills: VSS Agent skills for full alert lifecycle management, including microservice deployment, alert creation, retrieval, and configuration.
Diverse VLM Sources: RTVI VLM microservice (default), NVIDIA VLM NIM (for example, Cosmos Reason, Qwen VL), remote model endpoints (for example, GPT from OpenAI).
Automatic Video Resolution: Retrieves video segments from VIOS based on alert timestamps and sensor information.
Configurable Prompts: Alert-type-specific prompts with template placeholders for dynamic content injection from alert payloads.
Configurable VLM Parameters: Tune video preprocessing and inference settings to match deployment requirements and VLM backend capabilities.
VLM Warmup: Automatically sends multiple warmup inferences (default 3) to the VLM backend at startup, reducing initial request latency. Warmup polls the VLM NIM health endpoint until ready, then sends configurable rounds of test video inference before alert processing begins. Controlled via the
VLM_WARMUP_ENABLEDenvironment variable (enabled by default).Structured VLM Responses: Standardized response format with verdict (confirmed/rejected/unverified), reasoning trace, and HTTP-like status codes.
Custom VLM Response Processing: User-defined hooks for bespoke VLM response parsing.
Flexible Output Options: Persist results to Elasticsearch for analytics, with optional Kafka publishing.
Custom Category Names: Configure user-friendly display names for alert categories in output (for example,
"collision"displays as"Vehicle Collision"in Elasticsearch and UI).Alert Enrichment: Optionally generate detailed event descriptions after successful verification, providing additional context such as objects involved, sequence of events, and environmental factors.
Alert Verification#
Alert verification uses a VLM to confirm, enrich, and classify alerts generated upstream in the VSS pipeline. Each alert type is handled through a designated VLM prompt. The service supports alerts in NvSchema format (nv.Incident and nv.Behavior) via Kafka or HTTP API. Upon receiving an alert, it identifies the alert or incident type from the category attribute, retrieves the video URL for the alert time window from VIOS based on sensor ID and timestamps, and passes that URL to a VLM-based verification backend (RTVI VLM by default; NVIDIA Cosmos Reason NIM is also supported) through an OpenAI-compatible API. The VLM downloads the video and analyzes it against configurable prompts. Upstream alerts are typically generated by video analytics based on events detected within a camera’s field of view.
The alert verification workflow uses a 6-step process:
Alert Config Creation: Alert rules are defined using the
POST /api/v1/verification/configAPI call, which specifies the alert category, VLM prompt, and VLM invocation parameters. Alternatively, rules can be defined in a config file at deployment time.Alert Ingestion: Alerts are received via HTTP API or Kafka topics. The system validates the message schema, applies normalization, and queues alerts for processing.
Video URL Retrieval: The service extracts sensor ID, start time, and end time from the alert, then retrieves the corresponding video URL from VIOS via the REST API.
VLM Analysis: The video URL and a dynamically constructed prompt (based on alert type and context) are sent to the VLM backend. The VLM downloads the video, analyzes it, and returns a verdict with reasoning.
Result Publishing: Verified alerts are persisted to Elasticsearch and optionally published to Kafka (enabled via configuration). Results include the original alert fields plus verification metadata.
Alert Retrieval: Verified alerts are retrieved using the video analytics microservice API call to
GET /alerts/.
For a complete description of APIs, see the API Reference.
Real-Time Alerts#
The Alerts Microservice enables users to create real-time alert rules, which are registered with the RTVI VLM microservice for continuous processing of input streams using designated prompts. Users configure the prompt along with invocation parameters such as chunk duration, frame resolution, and frame rate. Generated alerts are published to a configurable Kafka topic and follow the nv.Incident structure defined in NvSchema.
Real-time alerts follow the process below:
Alert Config Creation: Alert rules are defined using the
POST /api/v1/realtimeAPI.Alert Publish and Subscribe: Generated alerts are published to a configurable Kafka topic that downstream consumers can subscribe to.
Alert Storage: The system can be configured to persist alerts from Kafka to a database such as Elasticsearch. Refer to the alerts developer profile for an example.
Always-on alerts are a refinement of the real-time alerts capability: users define VLM prompts and configuration in a file, and the rule is applied automatically to every newly added stream. This saves users from invoking the alert config creation API explicitly.
Alert-on-Demand#
Third-party CV applications that generate alerts for verification can use the alert-on-demand feature to run VLM verification on an associated video snippet stored in object storage.
Alert-on-demand uses the process below:
Alert creation and media storage: The CV application identifies an alert condition and creates a corresponding video snippet, which it stores in object storage (such as MinIO).
Alert delivery: The application delivers the alert to the Alerts Microservice through the
POST /api/v1/verification/ondemandAPI; the alert is captured using the NvSchemanv.Incidentpayload schema.Alert processing and storage: The Alerts Microservice downloads the video (which may be remote), encodes it in base64 format, and submits it to the underlying VLM along with the prompt and VLM invocation parameters; the returned VLM response is stored in Elasticsearch for retrieval through APIs.
Setup#
The Alerts Microservice supports the following deployment options:
Docker Compose deployment
Kubernetes deployment
Note
For detailed deployment instructions, refer to the Docker Compose Deployment and Kubernetes Deployment sections.
Prerequisites
Before deploying the Alerts Microservice, ensure the following requirements are met:
A valid NGC API key is required for accessing NVIDIA services.
A VLM backend (e.g., NVIDIA Cosmos Reason2 NIM) must be deployed and accessible.
VIOS service must be available for video retrieval.
Kafka is required if using event-driven ingestion.
Elasticsearch (optional) for result persistence and analytics.
Configuring Prompts#
Prompts control how the VLM analyzes video content for each alert type. They are configured via a JSON file (alert_type_config.json) that is loaded at startup and stored in RedisJSON for runtime access.
Note
Prompt configuration changes require a container restart in the current release. API-based prompt management is not supported.
Prompt Structure#
Each alert type requires a configuration with the following components:
user (mandatory): The primary instruction describing how to analyze the video and what decision to make.
system (optional): Contextual setup that shapes the VLM’s behavior, tone, and domain knowledge.
enrichment (optional): A prompt to generate detailed event descriptions after successful verification. When defined, a second VLM call provides additional context such as objects involved, sequence of events, and environmental factors.
output_category (optional): A user-friendly display name for the alert category in output. When specified, this name appears in Elasticsearch and downstream systems instead of the internal
alert_typevalue (e.g.,"collision"displays as"Vehicle Collision").vlm_params (optional): Per-alert-type VLM parameter overrides (for example
model,max_tokens,temperature,num_frames). Specified fields override the globalvlmconfig; unspecified fields fall back to the global defaults. Unknown fields raise a startup validation error so configuration typos fail fast.
Prompts are stored in RedisJSON at startup and selected based on the alert type during processing.
vlm_params may also be updated at runtime via PUT /api/v1/verification/config/{alert_type} (see the Alert Verification API) without restarting the container. At request time the service merges all sources with the following precedence:
Runtime overrides set via the API and persisted in Redis under
alert_config:{alert_type}— applied without restart.Static
vlm_paramsfromalert_type_config.json— requires container restart to take effect.Global
vlmconfig inconfig.yaml— requires container restart to take effect.
Prompt Templating#
Prompts support dynamic placeholders that are substituted with values from the alert payload at runtime:
Placeholders use
{field.path}syntax with dot notation for nested fields.Examples:
{info.primaryObjectId},{place.name},{objectIds}Missing fields render as
<missing:field.path>for debugging.Array values are automatically joined as comma-separated strings.
Alert Type Matching#
The category field in the incoming alert is matched against the alert_type in the prompt configuration. When a match is found, the corresponding prompts are selected and rendered with alert context.
VLM Response Format#
The VLM must respond using the following format for proper parsing:
<think>
reasoning trace here
</think>
<answer>
verdict (A/B or true/false)
</answer>
The answer section should contain a clear verdict that the system can map to confirmed/rejected status.
Example Configuration#
Below is an example prompt configuration from alert_type_config.json:
{
"version": "1.0",
"alerts": [
{
"alert_type": "collision",
"output_category": "Vehicle Collision",
"prompts": {
"system": "You are an expert AI assistant for video analysis. Your task is to determine whether a surveillance video depicts a **collision event** or **no collision**, based on the definitions below...",
"user": "Based on the video, which category best describes what occurred at {place.name}:\n(A) Collision (physical contact or impact detected)\n(B) No collision (no contact or impact)...",
"enrichment": "Provide a detailed description of this collision event. Include: 1) Objects/vehicles involved and their identifiers 2) Sequence of events leading to the collision 3) Estimated speeds and trajectories 4) Visible damage or safety concerns 5) Environmental factors such as lighting, weather, or obstacles"
},
"vlm_params": {
"max_tokens": 2048,
"temperature": 0.4,
"num_frames": 10
}
},
{
"alert_type": "Stop Anomaly Module",
"output_category": "Abnormal Vehicle Stop",
"prompts": {
"system": "You are an expert AI assistant for video analysis. Your task is to determine whether a surveillance video depicts **normal stopping behavior** or a **stop anomaly**, based on the strict definitions below...",
"user": "Based on the video, which category best describes the observed behavior at {place.name} (if available):\n(A) Stop anomaly (unexpected or unsafe stop)\n(B) Normal stop (safe and expected behavior)..."
}
}
]
}
In this example:
The
collisionalert type usesoutput_categoryto display as"Vehicle Collision"in output, and includes anenrichmentprompt for detailed event descriptions.The
Stop Anomaly Moduleusesoutput_categorybut no enrichment prompt (enrichment is optional per alert type).vlm_paramscan be added to any alert type to override the global VLM defaults;collisionshows it as an illustration, but each alert type may define its own subset of overrides independently.
Pluggable Response Parser#
By default, the Alerts Microservice extracts a binary YES/NO verdict from the VLM response, which is appropriate for alert verification but limiting for other use cases such as classification, analytics, enrichment, or counting. A pluggable response parser lets you replace the built-in verification parsing with a custom class that returns arbitrary structured data.
When to Use#
Binary verification (confirm/reject an alert): use the built-in parser. No configuration needed.
Classification (PPE violation type, hazard category, etc.): use a pluggable parser.
Analytics (object counts, zone occupancy, etc.): use a pluggable parser.
Enrichment (scene description, recommended action, etc.): use a pluggable parser.
Parser Contract#
A parser is any Python class that exposes a parse(self, raw_response: str) -> dict method. No imports from Alert Bridge are required.
class MyParser:
def parse(self, raw_response: str) -> dict:
"""Receive raw VLM response string, return a flat key-value dict."""
...
Requirements:
The class must be instantiable with no arguments (
def __init__(self) -> None). Classes with required__init__parameters or abstract base classes fail startup validation.Implementations must be thread-safe: the parser instance is shared across all worker threads (up to
alert_agent.num_workers) and the async dispatcher. Do not mutate instance state insideparse(); treat any attribute set in__init__as read-only. Read-only caches (compiled regexes, LRU caches on pure functions) are fine.Return a flat dictionary. Values may be primitives (string, int, bool, float) or stringified JSON for nested structures. The merge helper coerces non-string values to strings on the wire.
Configuration#
Set vlm.response_parser in config.yml to the dotted Python path of the parser class:
vlm:
response_parser: "parsers.my_classifier.MyClassifier"
Alert Bridge imports the module and instantiates the class at startup. Validation errors (missing module, missing class, no parse method, required __init__ parameters) abort startup with a descriptive error message.
Example Parser#
A PPE violation classifier that strips markdown code fences before parsing JSON:
import json
import re
_RE_MD_FENCE = re.compile(r'^```(?:\w+)?\s*\n(.*?)```\s*$', re.DOTALL)
class PPEClassifier:
def parse(self, raw_response: str) -> dict:
text = raw_response.strip()
m = _RE_MD_FENCE.match(text)
if m:
text = m.group(1).strip()
data = json.loads(text)
return {
"classification": data.get("label", "unknown"),
"severity": data.get("severity", "unknown"),
"confidence": data.get("confidence", 0),
"reasoning": data.get("reasoning", ""),
}
Output Schema#
Parser output is JSON-serialized into info["vlm_response"]. The outer info shape (transport metadata such as videoSource, verificationResponseCode, sensorId) is identical to verification mode, so existing Elasticsearch mappings, Kibana dashboards, and downstream consumers require no changes — only an additive vlm_response field in the mdx-vlm-incidents index.
Successful pluggable-parser response:
{
"info": {
"sensorId": "warehouse-cam-03",
"category": "ppe-violation",
"verdict": "",
"vlm_response": "{\"classification\": \"no-spotter\", \"severity\": \"high\", \"confidence\": \"0.95\", \"reasoning\": \"...\"}",
"videoSource": "rtsp://cam/stream",
"verificationResponseCode": "200",
"verificationResponseStatus": "OK"
}
}
The verification path (info["reasoning"]) and the pluggable-parser path (info["vlm_response"]) are disjoint — a given message contains exactly one of the two body fields, never both.
Error Handling#
If the parser raises an exception or returns a non-dict value, Alert Bridge emits a distinct error event:
Field |
Value on parser failure |
|---|---|
|
|
|
|
|
|
|
|
|
Not emitted (no valid parser output to serialize). The raw VLM response is logged at WARN level for operator inspection. |
The "Pluggable parser failed: ..." status prefix lets operators distinguish parser crashes from upstream VLM service failures ("VLM service internal error", "VLM response validation failed"). Parser-error events are routed through the error publish path; they never land on the success topic.
Integration Steps (End-to-End)#
The following walkthrough takes a PPE-violation classifier from zero to a working pipeline. Adapt the field names to your own use case.
Step 1 — Design the VLM output JSON shape
Decide what fields you want the VLM to produce. For a PPE classifier:
{
"label": "no-spotter", // one of [no-helmet, no-vest, no-spotter, compliant]
"severity": "high", // one of [low, medium, high]
"confidence": 0.95, // float between 0 and 1
"reasoning": "..." // short explanation
}
Step 2 — Update alert_type_config.json with a prompt that produces that shape
The prompt MUST instruct the VLM to return JSON with the exact field names your parser will read. Co-design the prompt and parser together.
{
"version": "1.0",
"alerts": [
{
"alert_type": "ppe-violation",
"output_category": "PPE Violation",
"prompts": {
"system": "You are a safety classification system. Always respond in valid JSON only, no other text.",
"user": "Classify this scenario at {place.name}. Return JSON with exactly these fields:\n- label: one of [no-helmet, no-vest, no-spotter, compliant]\n- severity: one of [low, medium, high]\n- confidence: float between 0 and 1\n- reasoning: short explanation"
}
}
]
}
Note
When using a pluggable parser, the prompt no longer needs the <think>...</think><answer>...</answer> Cosmos Reason format — that format is for the built-in verification parser. Tell the VLM to return plain JSON instead. VLMs commonly wrap JSON in markdown code fences (\`\`\`json ... \`\`\); the parser must strip them.
Step 3 — Write the parser class (parsers/ppe_classifier.py)
import json
import re
_RE_MD_FENCE = re.compile(r'^```(?:\w+)?\s*\n(.*?)```\s*$', re.DOTALL)
class PPEClassifier:
def parse(self, raw_response: str) -> dict:
text = raw_response.strip()
m = _RE_MD_FENCE.match(text)
if m:
text = m.group(1).strip()
data = json.loads(text)
return {
"classification": data.get("label", "unknown"),
"severity": data.get("severity", "unknown"),
"confidence": data.get("confidence", 0),
"reasoning": data.get("reasoning", ""),
}
The output keys (classification, severity, confidence, reasoning) are what end up serialized inside info["vlm_response"] — you can rename them freely; consumers only see the parser’s output.
Step 4 — Mount the parser directory into the container
Place an empty __init__.py and your parser file under a host directory and mount it to /app/parsers/ (see Deployment below for the full layout).
Step 5 — Set vlm.response_parser in config.yml
vlm:
response_parser: "parsers.ppe_classifier.PPEClassifier"
The path is the dotted module name relative to /app/ — parsers.ppe_classifier resolves to /app/parsers/ppe_classifier.py and PPEClassifier is the class inside that file.
Step 6 — Restart the container
docker compose restart alert-bridge
On startup, look for the log line Pluggable response parser active: 'parsers.ppe_classifier.PPEClassifier'. A startup error here means the dotted path or class signature is wrong; the container will exit instead of running with a broken parser.
Step 7 — Verify with a test alert
Submit a test alert via the HTTP API with category matching your alert_type:
curl -X POST http://<HOST_IP>:9080/api/v1/verification/ondemand \
-H "Content-Type: application/json" \
-d '{
"sensorId": "warehouse-cam-03",
"timestamp": "2026-03-17T10:00:00Z",
"end": "2026-03-17T10:00:10Z",
"category": "ppe-violation",
"place": {"name": "warehouse-zone-A"},
"info": {
"media_urls": ["http://<HOST_IP>:30888/vst/sim/media/ppe-clip.mp4"],
"media_type": "video"
}
}'
Then inspect the mdx-vlm-incidents Elasticsearch index — see Consuming the Output Downstream below.
Consuming the Output Downstream#
Pluggable parser output lands in info["vlm_response"] as a JSON-encoded string (per the NvSchema map<string,string> contract). Downstream consumers must deserialize it before reading individual fields.
Reading from Elasticsearch
The mdx-vlm-incidents index stores each verified alert as a document. Query by category and decode info.vlm_response client-side:
curl -s "http://<HOST_IP>:9200/mdx-vlm-incidents/_search?pretty" \
-H "Content-Type: application/json" -d '{
"size": 5,
"query": {"term": {"info.category": "ppe-violation"}},
"_source": ["sensorId", "info.vlm_response", "info.verdict", "info.verificationResponseStatus"]
}'
Each info.vlm_response value comes back as a JSON-encoded string; deserialize once to access the parser’s structured fields.
Reading from Kafka
When event_bridge.sinkType: "kafka" is enabled, verified alerts are published to mdx-vlm-incidents. A minimal Python consumer:
import json
from kafka import KafkaConsumer
consumer = KafkaConsumer(
"mdx-vlm-incidents",
bootstrap_servers="localhost:9092",
value_deserializer=lambda v: json.loads(v.decode()),
)
for msg in consumer:
info = msg.value.get("info", {})
parsed = json.loads(info.get("vlm_response", "{}")) # deserialize the string
print(info.get("category"), parsed.get("classification"), parsed.get("severity"))
The double-decode (Kafka payload → outer message → inner vlm_response string) is intentional: the inner JSON is opaque to the alert pipeline, which lets parser authors evolve fields without breaking the schema.
Viewing in Kibana
The vlm_response field appears as a single string column by default. To filter or chart on individual parser fields, define a Kibana scripted field (or use a runtime field on Elasticsearch 7.11+) that parses the JSON:
def m = /\"severity\"\s*:\s*\"([^\"]+)\"/.matcher(doc['info.vlm_response.keyword'].value);
return m.find() ? m.group(1) : null;
Repeat for each field you want to chart. For richer use cases, run a small ingest pipeline that pre-extracts parser fields into top-level Elasticsearch fields; that approach is out of scope here but follows standard ES ingest-node patterns.
Deployment#
Parser modules live outside the Alert Bridge image and are mounted into the container at /app/parsers/:
warehouse/vlm-as-verifier/
configs/
config.yml
alert_type_config.json
parsers/ # mounted into /app/parsers/
__init__.py # empty, required for Python package
my_classifier.py # your parser
In compose.yml, add the parser directory as a read-only volume:
volumes:
- ${VLM_AS_VERIFIER_CONFIG_FILE}:/app/configs/config.yml:ro
- ${VLM_AS_VERIFIER_ALERT_TYPE_CONFIG_FILE}:/app/alert_type_config.json:ro
- ${VLM_AS_VERIFIER_PARSERS_DIR:-/dev/null}:/app/parsers/:ro
To add or change parsers, drop a .py file into the parsers/ folder, update vlm.response_parser in config.yml, and restart the container. No image rebuild is required.
Note
When migrating from the legacy vlm.custom_parser_module registry mechanism, vlm.response_parser takes precedence on the default VLM path. Configuring both keys produces a startup WARN. Prefer vlm.response_parser for new integrations and remove vlm.custom_parser_module once migrated.
Alert Messages and Schema Format#
The Alerts Microservice supports NvSchema message formats for both alerts and incidents:
nv.Incident: For incident-type events requiring verification.nv.Behavior: For behavior/alert-type events requiring verification.
Both Protobuf and JSON formats are supported for message serialization.
Note
For the complete schema definitions of nv.Incident and nv.Behavior messages, refer to the Protobuf Schema documentation.
Alert Ingestion Endpoints#
HTTP API
POST /api/v1/alerts: Submit alerts innv.Behaviorformat (JSON or Protobuf).POST /api/v1/incidents: Submit incidents innv.Incidentformat (JSON or Protobuf).
Responses: 202 Accepted (queued), 422 (validation error), 415 (unsupported media type), 500 (internal error).
Kafka Topics (configurable)
Input:
mdx-incidents(nv.Incident),mdx-alerts(nv.Behavior)Output:
mdx-vlm-incidents,mdx-vlm-alerts(when Kafka sink enabled)
Key Fields#
The following table describes key fields in alert messages:
Field |
Description |
|---|---|
|
Camera or sensor identifier used for video retrieval. |
|
Event start time in ISO 8601 format. |
|
Event end time in ISO 8601 format. |
|
List of tracked object IDs involved in the event. |
|
Alert type used for prompt matching (e.g., “collision”, “Stop Anomaly Module”). |
|
Location identifier (e.g., intersection name, zone). |
|
Additional metadata; extended with VLM verification results in responses. |
Verification Response#
Verified alerts include the original fields plus an extended info section containing:
Field |
Description |
|---|---|
|
HTTP-like numeric code: 200 = success, 4xx = client errors, 5xx = server/VLM errors. |
|
Error description when non-200 status. |
|
One of: |
|
VLM reasoning trace extracted from the |
|
JSON-serialized parser output. Emitted only when |
|
Structured error classification on failure. One of: |
Example Request#
{
"sensorId": "Lafayette_Agnew",
"timestamp": "2025-09-11T00:08:27.822Z",
"end": "2025-09-11T00:09:22.122Z",
"objectIds": [
"958741182",
"958750871",
"958834290",
"958730631"
],
"place": {
"name": "city=Montague/intersection=Lafayette_Agnew"
},
"analyticsModule": {
"id": "Collision Detection Module",
"description": "Potential collision detected between 4 vehicles"
},
"category": "collision",
"isAnomaly": true,
"info": {
"location": "42.48837572978232,-90.73894264480816,0.0",
"primaryObjectId": "958750871"
}
}
Example Response#
{
"sensorId": "Lafayette_Agnew",
"timestamp": "2025-09-11T00:08:27.822Z",
"end": "2025-09-11T00:09:22.122Z",
"objectIds": [
"958741182",
"958750871",
"958834290",
"958730631"
],
"place": {
"name": "city=Montague/intersection=Lafayette_Agnew"
},
"analyticsModule": {
"id": "Collision Detection Module",
"description": "Potential collision detected between 4 vehicles"
},
"category": "collision",
"isAnomaly": true,
"info": {
"location": "42.48837572978232,-90.73894264480816,0.0",
"primaryObjectId": "958750871",
"verificationResponseCode": "200",
"verificationResponseStatus": "OK",
"reasoning": "The video shows vehicle 958750871 approaching the intersection...",
"verdict": "confirmed"
}
}
Scaling and Performance Tuning#
The Alerts Microservice supports concurrent processing to maximize throughput. This section describes the key configuration parameters for scaling the service to meet your deployment requirements.
Concurrency Model#
The Alerts Microservice uses a main thread with a fixed-size worker pool for concurrent processing:
Main Thread: Polls alerts from Kafka (or receives via HTTP API) and dispatches them to available workers.
Worker Threads: Each worker processes an assigned alert through the full pipeline:
Retrieves video URL from VIOS for the alert time window
Renders the prompt with alert context
Passes video URL, prompts and other required inputs to VLM backend for verification
Publishes the verified result to configured sinks
Workers operate independently, allowing multiple alerts to be processed in parallel. The main thread blocks when all workers are busy, providing natural backpressure. The end-to-end latency of external services (VLM, VIOS) directly bounds throughput per worker.
Key Scaling Parameters#
The following parameters control concurrency and throughput:
Parameter |
Default |
Impact |
|---|---|---|
|
|
Number of concurrent worker threads. Increasing this value scales alert processing throughput roughly linearly until constrained by CPU/memory or downstream service limits (VLM, VIOS). |
|
|
Number of alerts processed per batch within a worker. Higher values reduce overhead but increase memory usage. |
|
|
Maximum records fetched per Kafka poll. Controls batch size for ingestion; higher values improve throughput but increase memory and latency variance. |
|
|
Kafka consumer poll wait timeout (ms). Lower values reduce latency for sparse traffic; higher values reduce CPU usage. |
|
|
Maximum interval between polls before consumer is considered failed. Must exceed worst-case VLM processing time multiplied by batch size. |
Scaling Guidelines#
Vertical Scaling (Single Instance)
Increase ``num_workers``: Start with the number of CPU cores available. Monitor CPU utilization and increase until VLM backend becomes the bottleneck.
Tune ``chunk_size``: For high-volume deployments, increase to 2-4 to reduce per-message overhead. Keep at 1 for low-latency requirements.
Adjust ``max_poll_records``: Match to
num_workers × chunk_sizefor optimal batching. Avoid setting too high to prevent memory pressure.
Horizontal Scaling (Multiple Instances)
Kafka Consumer Groups: Multiple instances of the Alerts Microservice can share the same
event_bridge.kafka_source.group_idto distribute load across partitions.Partition Alignment: Ensure Kafka topic partitions >= number of consumer instances for effective load distribution.
Stateless Design: The Alerts Microservice is stateless; scale horizontally by adding replicas behind Kubernetes or Docker instances.
Performance Tuning Example#
For a deployment targeting 10 alerts/second with average VLM latency of 0.5 seconds:
alert_agent:
num_workers: 5 # 10 alerts/sec × 0.5 sec latency = 5 concurrent
chunk_size: 1 # Process one alert at a time for consistent latency
kafka:
max_poll_records: 5 # Match worker count
poll_timeout: 100 # Low latency polling
max_poll_interval_ms: 30000 # 30 seconds sufficient for fast VLM
Note
For Prometheus metric names, labels, and scrape configuration, see Metrics. In addition, monitor worker utilization (all workers busy indicates a need to scale up), Kafka consumer lag, and host CPU and memory.
Metrics#
Prometheus is a common open-source monitoring stack: a Prometheus server scrapes HTTP metrics endpoints on a schedule, stores the samples as labeled time series, and exposes the PromQL query language for charts and alerts. See the Prometheus documentation for terminology and query examples.
With metrics enabled, the Alerts Microservice serves Prometheus text format on /metrics; the port and enable flag are in the Configuration subsection.
Metrics record pipeline latency and event throughput at the moment of processing. Series are held in memory as Prometheus histograms and counters; the /metrics endpoint returns cumulative bucket counts, sums, and counts from which Prometheus computes rates and approximate quantiles at query time. No per-event raw samples are stored.
Configure Prometheus to scrape that URL, then use the Prometheus or Grafana UI to run PromQL. Deployments that include the blueprint observability bundle can follow Observability for UI URLs.
Configuration#
Prometheus exposure is controlled by environment variables:
Option |
Default |
Meaning |
|---|---|---|
|
|
Set to |
|
|
Port for the Prometheus scrape server. Scrape |
Metric shape and coverage are affected by these config.yml keys:
Key |
Default |
Effect |
|---|---|---|
|
|
Emits the opt-in |
|
|
When enabled, dropped events increment |
|
|
When enabled, VLM rate-limit drops increment |
|
|
When enabled, confirmed-verdict short-circuits increment |
|
|
Not required for Prometheus metrics. It only controls whether latency fields are written into output documents for Elasticsearch-based analysis. |
When per-sensor labels are enabled, missing, non-string, empty, or over-128-character sensor IDs are folded into sensorId="unknown". Valid string IDs, including IDs with punctuation such as /, :, ?, or ., pass through unchanged after trimming whitespace. The recorder caps distinct sensorId label values at 128 per process; any additional unseen ID is recorded as sensorId="unknown_overflow".
Pipeline histograms#
These histograms expose _bucket, _sum, and _count series.
Metric |
Measures |
|---|---|
|
Time from event end to Kafka publish (upstream path). |
|
Time from Kafka publish to microservice consume. |
|
Time from consume to worker assignment. |
|
VIOS (VST) video URL fetch duration. |
|
Effective clip length requested from VIOS (VST). |
|
Each VLM attempt, including retries and failed API calls. |
|
Time from worker assignment to sink-ready completion. |
|
Time from event end to sink-ready completion (end-to-end view). |
Event counters#
Metric |
Labels |
Meaning |
|---|---|---|
|
(none) |
Events entering per-event processing after filters. |
|
|
Batch-level drops before per-event processing. |
|
(none) |
Events skipped because Redis already held a confirmed verdict. |
|
|
Events completed by the per-event recorder. |
|
|
Events that failed verification. |
Drop reasons (alert_bridge_events_dropped_total): end_time_delta, dedup, rate_limit. Unexpected call-site values are defensively folded into unknown.
Verdicts (alert_bridge_events_total): confirmed, rejected, verification-failed, unknown.
Verification failure reasons (for breakdowns and operations that label by failure): vst_timeout, vst_overloaded, vst_not_found, vst_unavailable, vst_client_error, vst_server_error, vst_unknown, url_validation, vlm_parse_failure, vlm_timeout, vlm_connection_error, vlm_server_error, vlm_invalid_payload, no_prompt, redis_unavailable, unknown.
Per-sensor variants#
When alert_agent.metrics.per_sensor_labels is true, the service also emits the per-sensor series below. Histogram names use _by_sensor_seconds; counters use _by_sensor_total.
Aggregate histogram |
Per-sensor variant |
|---|---|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
Aggregate counter |
Per-sensor variant |
|---|---|
|
|
|
|
|
|
|
|
|
|
Additional metrics#
Async I/O and sink
Metric |
Labels |
Meaning |
|---|---|---|
|
|
Duration of async-capable Redis, VST, and sink operations, including sync fallbacks. |
|
|
Async operations that fell back to a synchronous path. |
|
(none) |
Current number of in-flight async sink operations. |
Known mode values are sync, async, sync_fallback, and async_submit. Known result values are success, error, timeout, and cancelled. Known fallback reason values are timeout, error, submit_error, and executor_unavailable.
Realtime and API (where enabled in the deployment)
Metric |
Labels |
Meaning |
|---|---|---|
|
(none) |
Current in-memory realtime rules registered with the service. |
|
(none) |
Current durable realtime rules with |
|
(none) |
Realtime rules created and committed to the live registry. |
|
(none) |
Realtime rules that reached durable |
|
(none) |
Realtime rules deleted from the live registry. |
|
|
Realtime rule creation or caption-task failures by stage. |
|
|
Replay loops that actually ran, labelled |
|
(none) |
Individual rules that failed during replay. |
|
|
Duration of RTVI VLM calls made by realtime rule management. |
|
|
Failed RTVI VLM calls made by realtime rule management. |
|
(none) |
Duration of Elasticsearch incident queries. |
|
(none) |
Failed Elasticsearch incident queries. |
Known realtime failure stage values are validation, es_persist, start_stream, missing_stream_id, generate_captions, es_update, caption_task_http, and caption_task_crash. Known RTVI method values are start_stream, generate_captions, stop_stream, and stop_captions. Replay short-circuits such as persistence-disabled or replay-already-in-flight do not increment alert_bridge_replay_invocations_total.
Sample PromQL queries#
The examples use sum by (le) (...) so quantiles aggregate correctly across multiple scrape targets or multiprocess workers. When alert_agent.metrics.per_sensor_labels is true, add sensorId to the grouping (for example sum by (sensorId, le) (...)) for per-camera views.
Quickly confirm that alert_bridge series are present (default port shown; use your PROMETHEUS_PORT):
curl -s http://localhost:9081/metrics | grep alert_bridge
End-to-end latency (P50 / P90 / P99):
histogram_quantile(0.50, sum by (le) (rate(alert_bridge_e2e_duration_seconds_bucket[5m])))
histogram_quantile(0.90, sum by (le) (rate(alert_bridge_e2e_duration_seconds_bucket[5m])))
histogram_quantile(0.99, sum by (le) (rate(alert_bridge_e2e_duration_seconds_bucket[5m])))
VLM stage: quantiles and average latency over a range:
histogram_quantile(0.50, sum by (le) (rate(alert_bridge_vlm_duration_seconds_bucket[5m])))
histogram_quantile(0.90, sum by (le) (rate(alert_bridge_vlm_duration_seconds_bucket[5m])))
histogram_quantile(0.99, sum by (le) (rate(alert_bridge_vlm_duration_seconds_bucket[5m])))
sum(rate(alert_bridge_vlm_duration_seconds_sum[5m])) / sum(rate(alert_bridge_vlm_duration_seconds_count[5m]))
Kafka consumer lag (P90):
histogram_quantile(0.90, sum by (le) (rate(alert_bridge_kafka_lag_duration_seconds_bucket[5m])))
Worker throughput (events completed per second, all outcomes):
sum(rate(alert_bridge_worker_processing_seconds_count[1m]))
Drop rate by reason:
sum by (reason) (rate(alert_bridge_events_dropped_total[5m]))
Events per second by verdict:
sum by (verdict) (rate(alert_bridge_events_total[1m]))
Configuration#
The Alerts Microservice is configured via a YAML-based configuration file. The following tables describe key configuration parameters organized by category.
VIOS Configuration#
Parameter |
Default |
Description |
|---|---|---|
|
|
Base URL for VIOS APIs. |
|
|
Endpoint for retrieving sensor/stream list. |
|
|
Segment anchor mode ( |
|
|
Clip segment duration in seconds. |
|
|
Enable bounding box overlay on video segments. |
Kafka Configuration#
Parameter |
Default |
Description |
|---|---|---|
|
|
Kafka broker addresses. |
|
|
Default Kafka consumer group ID. |
|
|
Offset reset policy (earliest/latest). |
|
|
Enable automatic offset commit. |
|
|
Maximum records per poll. |
|
|
Maximum interval between polls (ms). |
|
|
Session timeout (ms). |
|
|
Heartbeat interval (ms). |
|
|
Kafka consumer poll wait timeout (ms). |
|
|
Protobuf message type ( |
Event Bridge Configuration#
Parameter |
Default |
Description |
|---|---|---|
|
|
Source type ( |
|
|
Sink type ( |
|
|
Kafka source consumer group. |
|
|
Kafka topic for incidents. |
|
|
Kafka topic for alerts. |
VLM Configuration#
Note
For detailed VLM API documentation, refer to the Cosmos Reason2 NIM documentation.
Parameter |
Default |
Description |
|---|---|---|
|
(required) |
OpenAI-compatible VLM endpoint URL. |
|
|
VLM model name. |
|
|
Maximum tokens for VLM responses. |
|
|
Minimum pixel budget for video frames. |
|
|
Maximum pixel budget for video frames. |
|
|
Number of video frames to sample. |
|
|
Enable frame sampling mode. |
|
|
Sampling FPS when enabled. |
|
|
VLM sampling temperature. |
|
|
Per-request timeout in seconds for VLM API calls. Increase for larger models or longer videos. |
|
|
Number of VLM request retries on API or validation failure. |
|
|
When true, skips custom pixel and frame parameters and lets the VLM backend use its own defaults. |
|
(unset) |
Dotted Python path to a pluggable response parser class (for example, |
|
|
Seconds to wait for VLM NIM readiness at startup before aborting. |
|
|
Seconds between NIM health polls during warmup. |
|
|
Timeout in seconds for the warmup inference request. |
|
|
Number of warmup inference rounds to send at startup. Each round retries up to 3 times on failure. Set to |
Note
The min_pixels and max_pixels parameters must be set in accordance with the VLM’s maximum context window.
Note
VLM warmup runs automatically at startup by default and sends 3 inference rounds to prime the GPU model cache. To disable warmup entirely, set the environment variable VLM_WARMUP_ENABLED=false. To adjust the number of rounds, set vlm.warmup.num_requests in the config. The vlm.warmup.* parameters are optional; defaults work for most deployments.
Alerts Microservice Configuration#
Parameter |
Default |
Description |
|---|---|---|
|
|
Number of worker threads for concurrent processing. |
|
|
Maximum stream size in minutes. |
|
|
Default stream interval in minutes. |
|
|
Skip VIOS lookup; use local media files directly. |
|
|
Chunk size for processing. |
|
|
Enable alert enrichment for generating detailed event descriptions after verification. |
|
|
When |
Elasticsearch Configuration#
Parameter |
Default |
Description |
|---|---|---|
|
|
Enable Elasticsearch persistence. |
|
|
Elasticsearch host URLs. |
|
|
Elasticsearch index for verified incidents. |
|
|
Elasticsearch index for verified alerts. |
Prompt Configuration#
Parameter |
Default |
Description |
|---|---|---|
|
|
Path to alert type configuration file. |
|
|
Prefer prompts from alert payload over stored prompts. |
|
|
Override stored prompts with config file on startup. |
Logging Configuration#
Parameter |
Default |
Description |
|---|---|---|
|
|
Application log level (DEBUG, INFO, WARNING, ERROR, CRITICAL). |
|
(see config) |
Log message format string. |
|
|
Log level for third-party libraries. |
API Reference