Context-Aware RAG#

The VSS Context-Aware RAG is responsible for the video search and summarization based on the dense captions generated from the data processing pipeline. It implements the following data pipelines to achieve it:

Data Ingestion:
- Responsible for receiving and processing incoming documents.
- Uses the context manager to add documents, process them via batching, and prepare them for subsequent retrieval.
Data Retrieval:
- Focuses on extracting the relevant context in response to user queries.
- Leverages Graph-RAG and Vector-RAG functions to deliver precise, context-aware answers.

Context-Aware RAG starts its own process and event loop as to not block the main process and to increase performance.

Data Ingestion#

Parallel and Asynchronous Ingestion:
- Documents are ingested in parallel and asynchronously processed to avoid blocking the main process.
- Documents can be added with doc_index and doc_meta.
  - doc_index is used to uniquely identify the document (doc_0, doc_1, doc_2, etc.).
  - doc_meta is used to store additional metadata about the document ({stream_id: 0, timestamp: 1716393600}).
- Processing is done in Context-Aware RAG separate process.
- Documents can arrive in any order.
Batcher:
- Batcher groups documents into fixed-size, duplicate-checked batches (using doc_id//batch_size).
- When a batch fills up, downstream processing can be triggered immediately (e.g., graph extraction).
Example Batching Process (with batch_size = 2):
- If doc_4 arrives first, it’s placed in batch 2 (doc_id 4 // batch_size 2 = 2)
- When doc_0 arrives, it’s placed in batch 0 (doc_id 0 // batch_size 2 = 0)
- When doc_1 arrives, it completes batch 0 (doc_id 1 // batch_size 2 = 0)
- Batch 0 is now full and triggers asynchronous processing:
  - Partial graph construction begins for documents 0 and 1
- This process continues until all documents arrive
- Once all batches are processed, the final lexical graph is constructed.

Summarization#

CA-RAG provides the following methods for summarizing content.

Batch: This method performs summarization in two stages:
- Batching: Groups together documents into batches and generates summaries for each batch.
- Aggregation: Combines batch summaries using a secondary prompt (summary_aggregation).
This method is ideal for handling long videos.

Data Retrieval (QnA)#

CA-RAG supports Question-Answering (QnA) functionality via VectorRAG and GraphRAG. Multi-stream support is available only for GraphRAG. GraphRAG or VectorRAG can be configured through CA-RAG Configuration.

VectorRAG
- Captions generated by the Vision-Language Model (VLM), along with their embeddings, are stored in Milvus DB.
- Embeddings can be created using any embedding NIM
  - By default, embeddings are created using nvidia/llama-3_2-nv-embedqa-1b-v2.
- For a query, the top five most similar chunks are retrieved, re-ranked using any reranker NIM and passed to a Large Language Model (LLM) NIM to generate the final answer.
  - By default, the reranker NIM is set to nvidia/llama-3_2-nv-rerankqa-1b-v2.
GraphRAG
- Graph Extraction: Entities and relationships are extracted from VLM captions, using an LLM, and stored in a GraphDB. Captions and embeddings, generated with any embedding NIM, are also linked to these entities.
- Graph Retrieval: For a given query, relevant entities, relationships, and captions are retrieved from the GraphDB and passed to an LLM NIM to generate the final answer.
- Multi-stream Support:
  - CA-RAG supports multi-stream processing, allowing users to process multiple live streams or files concurrently.
  - For multi-stream processing, stream-id is stored with each caption and entity. This allows to retrieve the captions and corresponding entities and relationships for a specific stream.
  - To enable multi-stream processing, set the multi-channel parameter to true in the config/config.yaml file. By default, it is set to false.

Alerts#

The Alerts feature allows event-based notifications. For each VLM caption, an LLM analyzes and generates alerts based on the natural language defined event criteria.

For example, to configure alerts for a traffic video to detect accidents, the criteria can be defined in natural language in the UI Application.

incident: accident on the road;
response: first responders arrive for help;

When an alert is detected, the response is sent to the user via the VSS notification system. Here is an example of the alert notification:

Alert Name: incident
Detected Events: accident on the road
Time: 80 seconds
Details: 2025-03-15 12:07:39 PM: The scene depicts an intersection with painted
stop lines and directional arrows on the road surface. A red sedan and a yellow
sedan are involved in a collision within the intersection. The red sedan appears to
be impacting the yellow sedan on its front passenger side.