Video Summarization Workflow#

The Video Summarization Workflow enables analysis and summarization of video content without being constrained by the standard VLM context window limitations, allowing for the analysis of long-form video content.

Capabilities

Use Cases

Automated incident report generation
Event detection in extended video archives
Shift summaries and daily activity reports

Technical Approach

Standard VLMs are limited to processing short video clips, usually less than 1 minute, depending on the number of subsampled frames and level of detail required. This workflow uses a microserivce which will segment videos of any length, analyze each segment with a VLM, and then synthesize the results into a coherent summary with timestamped events.

Estimated Deployment Time: 15-20 minutes

The following diagram illustrates the video summarization architecture:

Key Features of the Vision Agent with Long Video Summarization:

Quickly generate an overall summary within seconds, offering a high-level narrative of the video.
Formulate timestamped highlights of the video based on user-defined events.
Processes uploaded video files (minutes to hours in duration)
Generates narrative summaries of video content
Returns results through the AI agent interface

What’s being deployed#

VSS Agent: Agent service that orchestrates tool calls and model inference to answer questions and generate outputs
VSS Agent UI: Web UI with chat, video upload, and different views
VSS Video IO & Storage (VIOS): Video ingestion, recording, and playback services used by the agent for video access and management
Nemotron LLM (NIM): LLM inference service used for reasoning, tool selection, and response generation
Cosmos Reason 2 (NIM): Vision-language model with physical reasoning capabilities
VSS Long Video Summarization: Microservice for segmenting and summarizing long-form video content
ELK (without Logstash): Elasticsearch, and Kibana stack for log storage and analysis
Phoenix: Observability and telemetry service for agent workflow monitoring

Prerequisites#

Before you begin, ensure all of the prerequisites are met. See Prerequisites for more details.

Next steps#

Once you’ve familiarized yourself with the long video summarization workflow, you can explore adding other agent workflows, such as search and alerting.

Additionally, you can dive deeper into the agent tools for video management, report generation, and video understanding.

Known Issues#

For OpenAI remote VLM endpoint, please use gpt-4o for now. Other models are not supported yet.
Not supported: OpenAI VLM with a build.nvidia.com LLM. When using a build.nvidia.com LLM, do not use an OpenAI VLM or set OPENAI_API_KEY.

For known issues and limitations, see:

Agent Known Issues - VSS Agent known issues and limitations
Known Issues - VSS Agent UI known issues

Troubleshooting#

When encountering issues with the LVS workflow:

View container logs - See Viewing Container Logs for instructions on viewing and analyzing container logs
Navigate the Phoenix UI - See Navigating the Phoenix UI for step-by-step guidance on viewing traces and debugging agent workflows
Check known issues - Review Agent Known Issues (agent) and Known Issues (UI) for documented limitations and workarounds

Video Summarization Workflow#

What’s being deployed#

Prerequisites#

Deploy#

Step 1: Deploy the Agent#

Step 2: Upload a video#

Step 3: Generate a report#

Step 4: Search for specific events in the Dashboard#

Step 5: Teardown the Agent#

Next steps#

Known Issues#

Troubleshooting#