Quickstart#

Overview#

Deploy a vision agent in 10 minutes

This guide will walk you through the steps to deploy a vision agent using the VSS Blueprint. You’ll create a simple vision agent that you can upload videos to, ask questions about the videos, and generate reports. Afterwards, you can explore adding agent workflows such as video summarization, search, and alerting.

The following diagram illustrates a conceptual architecture of the base vision agent that you’ll deploy:

Key Features of the Base Vision Agent:

Upload videos to a video management system and connect the agent to it through the MCP service.
Report generation tool which utilizes a LLM and VLM to generate reports from the videos.
Video understanding tool which utilizes a VLM to perform video understanding tasks, such as Q&A.

What’s being deployed#

VSS Agent: Agent service that orchestrates tool calls and model inference to answer questions and generate outputs
VSS Agent UI: Web UI with chat, video upload, and different views
VSS Video IO & Storage (VIOS): Video ingestion, recording, and playback services used by the agent for video access and management
Nemotron LLM (NIM): LLM inference service used for reasoning, tool selection, and response generation
Cosmos Reason 2 (NIM): Vision-language model with physical reasoning capabilities
Phoenix: Observability and telemetry service for agent workflow monitoring

Prerequisites#

Before you begin, ensure all of the prerequisites are met. See Prerequisites for more details.

Download Sample Data and Deployment Package#

Configure NGC Access#

Note

Before proceeding, ensure that NGC CLI is installed on your system. For installation instructions, see Install NGC CLI in the Prerequisites.

Download Sample Data From NGC#

Perform this on the machine from which you may use the web-browser.

This can be done using NGC CLI as documented below, or downloaded from the NGC UI directly.

 # Download sample data
 ngc registry resource download-version nvidia/vss-developer/dev-profile-sample-data:3.1.0

 tar -xf dev-profile-sample-data_v3.1.0/dev-profile-sample-data.tar.gz -C </path/to/extract/to>

 rm -rf dev-profile-sample-data_v3.1.0

Download the Deployment Package#

Perform this on the machine where you intend to deploy the agent.

Note

Git LFS required: The repository uses Git LFS for large files. Install it before cloning or pulling. For example, on Ubuntu/Debian: sudo apt-get install git-lfs. On other systems, see Git LFS installation.

git clone https://github.com/NVIDIA-AI-Blueprints/video-search-and-summarization.git
cd video-search-and-summarization
git checkout tags/v3.1.0
git lfs install
git lfs pull

Next steps#

Once you’ve familiarized yourself with the base vision agent, you can test other videos and explore adding agent workflows, such as video summarization, search, and alerting.

Additionally, you can dive deeper into the agent tools for report generation, video understanding, and video management.

Known Issues#

cosmos-reason2-8b NIM can not be restarted after being stopped or container crash. To restart, you need to redeploy the entire blueprint.

Quickstart#

Overview#

What’s being deployed#

Prerequisites#

Download Sample Data and Deployment Package#

Configure NGC Access#

Download Sample Data From NGC#

Download the Deployment Package#

Deploy#

Step 2: Upload a video#

Step 3: Ask the Agent questions#

Step 4: Generate a report#

Human in the loop (HITL) prompt editing:#

Step 5: Teardown the Agent#

Next steps#

Known Issues#