Python CLI Client#
Prerequisites#
A reference Python CLI client is provided along with VSS. The client internally calls the REST APIs exposed by VSS.
The Python package dependencies for the CLI client can be installed using:
# Create a virtual environment (optional)
python3 -m venv vss-cli-venv
source vss-cli-venv/bin/activate
# Install the dependencies
pip3 install tabulate tqdm sseclient-py requests fastapi uvicorn
Download the CLI client from the VSS GitHub repository.
curl -LO https://raw.githubusercontent.com/NVIDIA-AI-Blueprints/video-search-and-summarization/refs/heads/main/src/vss-engine/src/via_client_cli.py
The CLI client can be executed by running:
python3 via_client_cli.py <command> <args> [--print-curl-command]
By default, the client assumes that the VSS API server is running at
http://localhost:8000. This can be configured by exporting the environment
variable VIA_BACKEND=<VIA_API_URL>
or passing the argument
--backend <VIA_API_URL>
.
Note
Refer to Tuning Prompts for more info on tuning prompts
Refer to Launch VSS UI for info on the backend URL for Helm deployments.
Refer to Deploy VSS for info on the backend URL for Docker Compose deployments.
The CLI client also provides an option to print the curl command for any
operation. This can be done by passing the --print-curl-command
argument
to the client.
To get a list of all supported commands and options supported by each command run:
python3 via_client_cli.py -h
python3 via_client_cli.py <command> -h
File Summarization and Q&A#
The following section describes the Python CLI commands for file summarization and Q&A.
Export the VSS API endpoint as an environment variable.
export VIA_BACKEND=http://<VSS_API_HOST>:<VSS_API_PORT>
Upload the file to VSS using the add-file
command.
python3 via_client_cli.py add-file warehouse.mp4 # Uploads the warehouse.mp4 file from the local file system where the client is running
The above command will upload the warehouse.mp4
file from your local file system to VSS. To instead use a file already inside the VSS container use:
python3 via_client_cli.py add-file /opt/nvidia/via/streams/warehouse.mp4 --add-as-path # /opt/nvidia/via/streams/warehouse.mp4 is a path inside the VSS container
# Output
File added - id: 0d975eca-64f8-4c64-a4d7-a49ea834b3ee, filename warehouse.mp4, bytes 156822870, purpose vision, media_type video
Get the VLM model ID from the list-models
command.
python3 via_client_cli.py list-models
# Output
┌──────────┬─────────────────────┬────────────┬────────────┐
│ ID │ Created │ Owned By │ API Type │
├──────────┼─────────────────────┼────────────┼────────────┤
│ vila-1.5 │ 2025-03-16 09:33:43 │ NVIDIA │ internal │
└──────────┴─────────────────────┴────────────┴────────────┘
Use the summarize
command to summarize the file. id
is the file id returned by the add-file
command
and model
is the VLM model ID returned by the list-models
command.
python3 via_client_cli.py summarize --id 0d975eca-64f8-4c64-a4d7-a49ea834b3ee --model vila-1.5 \
--chunk-duration 10 --prompt "You are a warehouse monitoring system. Describe the events in this warehouse and look for any anomalies. Start and end each sentence with a time stamp." \
--system-prompt "You are a helpful assistant. Answer the user's question." \
--caption-summarization-prompt "Summarize similar captions that are sequential to one another, while maintaining the details of each caption, in the format start_time:end_time:caption. The output should be bullet points in the format start_time:end_time: detailed_event_description." \
--summary-aggregation-prompt "Aggregate captions in the format start_time:end_time:caption based on whether captions are related to one another or create a continuous scene. The output should only be bullet points in the format start_time:end_time: detailed_event_description. " \
--enable-chat
# Output
Summarization finished
Request ID: 797ceb0a-96ec-4b29-9063-c378b4e63c9e
Request Creation Time: 2025-03-16 09:50:00
Model: vila-1.5
Object: summarization.completion
Media start offset: 00:00
Media end offset: 03:30
Chunks processed: 21
Processing Time: 46 seconds
Response:
Here are the summarized captions in the format start_time:end_time: detailed_event_description:
• 0:00-0:20: A warehouse with tall shelves stacked with boxes and pallets.
• 20.00-30.00: The warehouse is well-lit and organized, with aisles labeled "C", "D", "E", and "F". Cardboard boxes of various sizes are stacked on metal shelving units. A person wearing a high-visibility vest walks down the aisle, and there is a small cardboard box on the floor. The warehouse appears to be empty of any other people or objects.
...
Additional features like audio transcription, Set-of-Marks prompting using CV pipeline, and alerts can also be enabled.
Q&A can be performed on the file using the chat
command.
python3 via_client_cli.py chat --id 0d975eca-64f8-4c64-a4d7-a49ea834b3ee --model vila-1.5 \
--prompt "When did the forklift first arrive?"
# Output
Response:
The forklift first entered the warehouse between 0:20-0:40.
Note
The summarization and chat requests will keep running on the server side until the request is completed even if the client is disconnected.
Remove the file using the delete-file
command.
python3 via_client_cli.py delete-file 0d975eca-64f8-4c64-a4d7-a49ea834b3ee
# Output
File deleted - id 0d975eca-64f8-4c64-a4d7-a49ea834b3ee, status True
Live Stream Summarization, Alerts, and Q&A#
The following section describes the Python CLI commands for file summarization and Q&A.
Export the VSS API endpoint as an environment variable.
export VIA_BACKEND=http://<VSS_API_HOST>:<VSS_API_PORT>
Start a alert callback test server using the alert-callback-server
command.
python3 via_client_cli.py alert-callback-server --host 0.0.0.0 --port 5004
# Output
Server starting. Alert callback handler at path /via-alert-callback
INFO: Started server process [2282991]
INFO: Waiting for application startup.
INFO: Application startup complete.
INFO: Uvicorn running on http://0.0.0.0:5004 (Press CTRL+C to quit)
Add a live stream using the add-live-stream
command.
python3 via_client_cli.py add-live-stream --description "Some live stream description" \
rtsp://192.168.1.100:8554/video/media1
# Output
Live stream added - id: de0e04bc-b007-4967-9cad-28f48a02c126
Get the VLM model ID from the list-models
command.
python3 via_client_cli.py list-models
# Output
┌──────────┬─────────────────────┬────────────┬────────────┐
│ ID │ Created │ Owned By │ API Type │
├──────────┼─────────────────────┼────────────┼────────────┤
│ vila-1.5 │ 2025-03-16 09:33:43 │ NVIDIA │ internal │
└──────────┴─────────────────────┴────────────┴────────────┘
Use the summarize
command to summarize the live-stream. id
is the live-stream id returned by the add-live-stream
command
and model
is the VLM model id returned by the list-models
command. For live-streams, the --stream
flag is required.
Summaries are periodically generated for every summary-duration
seconds and returned to the client using server-sent events.
Alerts configured using the --alert
flag are also sent to the client using server-sent events.
python3 via_client_cli.py summarize --id de0e04bc-b007-4967-9cad-28f48a02c126 --model vila-1.5 \
--chunk-duration 10 --summary-duration 60 \
--prompt "You are a warehouse monitoring system. Describe the events in this warehouse and look for any anomalies. Start and end each sentence with a time stamp." \
--system-prompt "You are a helpful assistant. Answer the user's question." \
--caption-summarization-prompt "Summarize similar captions that are sequential to one another, while maintaining the details of each caption, in the format start_time:end_time:caption. The output should be bullet points in the format start_time:end_time: detailed_event_description." \
--summary-aggregation-prompt "Aggregate captions in the format start_time:end_time:caption based on whether captions are related to one another or create a continuous scene. The output should only be bullet points in the format start_time:end_time: detailed_event_description. " \
--alert "safety-issues:workers not wearing safety equipment" \
--enable-chat --stream
# Output
Request ID: e35bfb7c-d856-4378-a2d1-dfc21a555faf
Request Creation Time: 2025-03-16 10:30:08
Model: vila-1.5
----------------------------------------
Object: summarization.progressing
Media start timestamp: 2025-03-16T10:30:10.254Z
Media end timestamp: 2025-03-16T10:31:10.301Z
Response:
Here are the aggregated captions in the format start_time:end_time: detailed_event_description:
• 2025-03-16T10:30:10.254Z:2025-03-16T10:30:20.254Z: A woman wearing a neon orange vest, black pants, and a pink cap walks towards the right side of the frame, followed by a woman wearing a pink hoodie, black pants, and a pink beanie walking in the same direction.
• 2025-03-16T10:30:30.298Z:2025-03-16T10:31:10.301Z: The video shows a warehouse with multiple rows of shelving units, workers wearing safety vests and hard hats, and a worker on a ladder, possibly performing maintenance or stocking. The warehouse is filled with workers engaged in various tasks such as moving boxes and organizing the shelves, with multiple workers moving around the warehouse, carrying boxes, and interacting with each other.
----------------------------------------
...
Additional features like audio transcription, Set-of-Marks prompting using CV pipeline, and alerts can also be enabled.
The above summarize command will keep running until the user interrupts the process or the live-stream ends. In case, the client gets disconnected from the server, it can reconnect to the live-stream by re-running the summarize command with the same id as the live-stream.
python3 via_client_cli.py summarize --id de0e04bc-b007-4967-9cad-28f48a02c126 \
--model vila-1.5 --stream
The server caches the summaries that were generated while the client was disconnected and returns them as soon as the client reconnects.
Note
The summarization requests will keep running on the server side till the live-stream reaches end-of-stream even if the client is disconnected. Delete the live-stream to stop the summarization request.
Alerts can also be configured to be sent to a callback URL using the add-alert
command.
python3 via_client_cli.py add-alert --live-stream-id de0e04bc-b007-4967-9cad-28f48a02c126 \
--events "workers not wearing safety equipment" \
--callback-url "http://<HOST_IP_FOR_ALERT_SERVER>:5004/via-alert-callback"
# Output
Alert added - id: d60f3420-578b-42bb-8d34-0829d9e4269c
Q&A can be performed on the live-stream using the chat
command.
python3 via_client_cli.py chat --id de0e04bc-b007-4967-9cad-28f48a02c126 --model vila-1.5 \
--prompt "How many people are seen?"
# Output
Response:
There are at least 2 people seen in the video. One woman wearing a pink beanie, black pants, and a pink hoodie is seen walking towards the right side of the frame. Another woman wearing a neon orange vest, black pants, and a pink cap is also seen walking towards the right side of the frame.
Remove the live-stream using the delete-live-stream
command.
python3 via_client_cli.py delete-live-stream de0e04bc-b007-4967-9cad-28f48a02c126
# Output
Live stream deleted - id de0e04bc-b007-4967-9cad-28f48a02c126
Generating VLM Dense Captions#
This follows the same procedure as File and Live Stream Summarization except for
calling the generate-vlm-captions
command instead of the summarize
command.
# Add file or live stream, note the id returned
python3 via_client_cli.py add-file ...
python3 via_client_cli.py generate-vlm-captions --id 0d975eca-64f8-4c64-a4d7-a49ea834b3ee \
--model cosmos-reason1 \
--prompt "Write a concise and clear dense caption for the provided warehouse video" \
--system-prompt "You are a helpful assistant. Answer the user's question." \
--chunk-duration 60
# Delete file or live stream
python3 via_client_cli.py delete-file 0d975eca-64f8-4c64-a4d7-a49ea834b3ee
Alert Review#
Alert review can be performed using the review-alert
command to review external alerts using VLM analysis.
python3 via_client_cli.py review-alert \
--video-path "/opt/nvidia/via/streams/warehouse.mp4" \
--prompt "Describe the person in detail." \
--system-prompt "You are a helpful assistant. Answer the user's question." \
--sensor-id "camera-001" \
--alert-type "RESTRICTED_ACCESS" \
--alert-description "Person detected" \
--event-type "person_detected" \
--event-description "Person detected" \
--alert-severity "HIGH" \
--meta-labels "location:warehouse_entrance"
# Output
Alert review completed:
Request ID: 19d8cb3c-7102-4122-9308-c1c9ceab635b
Review Status: SUCCESS
Reviewed By: cosmos-reason1
Reviewed At: 2025-09-03T12:26:49Z
Review Verification: N/A
================================================================================
REASONING
================================================================================
No reasoning available
================================================================================
Description: The person is wearing black pants and shoes, a white shirt with
rolled-up sleeves, and a yellow hard hat. They pick up an orange cone from a stack
on their right side and walk to the center of the aisle to place it next to the
yellow caution tape that divides the aisle into two sections.
Note
The video file will not be uploaded. Instead, it must be accessible inside the container. A shared mount can be created for this purpose. Ensure that the container has the correct video-path pointing to this mounted directory.
Multi-Image Summarization and Q&A#
The following section describes the Python CLI commands for file summarization and Q&A.
Export the VSS API endpoint as an environment variable.
export VIA_BACKEND=http://<VSS_API_HOST>:<VSS_API_PORT>
Add images using the add-file
command.
python3 via_client_cli.py add-file its_overlay_1.png --is-image
python3 via_client_cli.py add-file its_overlay_2.png --is-image
python3 via_client_cli.py add-file its_overlay_3.png --is-image
python3 via_client_cli.py add-file its_overlay_4.png --is-image
python3 via_client_cli.py add-file its_overlay_5.png --is-image
python3 via_client_cli.py add-file its_overlay_6.png --is-image
# Output
File added - id: 9fcf2676-6ce8-4fb8-a6b1-c833f054c656, filename its_overlay_1.png, bytes 2619926, purpose vision, media_type image
File added - id: e917b30f-43ae-4f7d-bad1-cab90d2df7c9, filename its_overlay_2.png, bytes 2436534, purpose vision, media_type image
...
Get the VLM model id from the list-models
command.
python3 via_client_cli.py list-models
# Output
┌──────────┬─────────────────────┬────────────┬────────────┐
│ ID │ Created │ Owned By │ API Type │
├──────────┼─────────────────────┼────────────┼────────────┤
│ vila-1.5 │ 2025-03-16 09:33:43 │ NVIDIA │ internal │
└──────────┴─────────────────────┴────────────┴────────────┘
Use the summarize
command to summarize the live-stream. id
is the file id returned by the add-file
command, multiple ids can be provided for multi-image summarization.
model
is the VLM model id returned by the list-models
command.
python3 via_client_cli.py summarize --id 9fcf2676-6ce8-4fb8-a6b1-c833f054c656 \
--id e917b30f-43ae-4f7d-bad1-cab90d2df7c9 \
--id 6c02d9af-b08f-4cff-9539-63b36f4badfe \
--id 32fe4823-7659-493c-a97e-44136b3efa84 \
--id b82684f3-de46-4a81-8f47-abed6c0461af \
--id 646da2d7-7dc9-4c9e-8a72-ff1727859ee2 \
--model vila-1.5 \
--prompt "You are an intelligent traffic system. You will be given a set of images from a traffic intersection. Write a detailed caption for each image to capture all traffic related events and details. For each caption, include the timestamp from the image." \
--system-prompt "You are a helpful assistant. Answer the user's question." \
--caption-summarization-prompt "Combine the captions if needed. Do not lose any information" \
--summary-aggregation-prompt "You will be given a set of captions describing several images from a traffic intersection. Write a summary of the events from the captions and include the timestamp information. " \
--enable-chat
# Output
Summarization finished
Request ID: dbcca47c-0e00-4fc7-a93f-e3eafe1cd26c
Request Creation Time: 2025-03-18 14:30:51
Model: vila-1.5
Object: summarization.completion
Media start offset: 00:00
Media end offset: 00:00
Chunks processed: 1
Processing Time: 39 seconds
Response:
**Traffic Report**
================
**00:00:00 - 00:00:11: Intersection Congestion**
---------------------------------------------
Multiple vehicles enter the intersection and wait for the right of way, including:
* A red car
* A yellow car
* A black car
* A yellow school bus
* A red fire truck
* A black and white police car
All vehicles are waiting for their turn to proceed, causing a brief congestion in the intersection.
As a next step, Q&A can be performed on the live-stream using the chat
command.
python3 via_client_cli.py chat --id 9fcf2676-6ce8-4fb8-a6b1-c833f054c656 \
--id e917b30f-43ae-4f7d-bad1-cab90d2df7c9 \
--id 6c02d9af-b08f-4cff-9539-63b36f4badfe \
--id 32fe4823-7659-493c-a97e-44136b3efa84 \
--id b82684f3-de46-4a81-8f47-abed6c0461af \
--id 646da2d7-7dc9-4c9e-8a72-ff1727859ee2 \
--model vila-1.5 \
--prompt "Is there a police car?"
# Output
Response:
Yes, there is a black and white police car in the intersection, waiting for the right of way.
Remove all the files using the delete-file
command.
python3 via_client_cli.py delete-file 9fcf2676-6ce8-4fb8-a6b1-c833f054c656
python3 via_client_cli.py delete-file e917b30f-43ae-4f7d-bad1-cab90d2df7c9
python3 via_client_cli.py delete-file 6c02d9af-b08f-4cff-9539-63b36f4badfe
python3 via_client_cli.py delete-file 32fe4823-7659-493c-a97e-44136b3efa84
python3 via_client_cli.py delete-file b82684f3-de46-4a81-8f47-abed6c0461af
python3 via_client_cli.py delete-file 646da2d7-7dc9-4c9e-8a72-ff1727859ee2
# Output
File deleted - id 9fcf2676-6ce8-4fb8-a6b1-c833f054c656, status True
...
Multi-Stream and Concurrent Requests#
VSS supports multi-stream and concurrent requests. This can be achieved by running multiple summarization or chat commands in parallel using Python CLI client in multiple terminals.
For programmatic API usage, this can be achieved by calling /summarize
and /chat
APIs in parallel in multiple threads or processes.
All Commands Reference#
The following section describes each of the commands in detail.
Files Commands#
Add File#
Calls POST /files
internally. Uploads or adds a file as path. Prints the file id and other details.
Reference:
via_client_cli.py add-file [-h] [--add-as-path] [--is-image]
[--backend BACKEND] [--print-curl-command] file
Example for uploading a file:
Note
File types supported: mp4, mkv - with h264/h265 videos. Images: jpg, png.
python3 via_client_cli.py add-file video.mp4
Example for adding a file as path (This requires the file path to be accessible inside the container):
python3 via_client_cli.py add-file --add-as-path /media/video.mp4
Example for uploading an image file:
python3 via_client_cli.py add-file image.jpg --is-image
List Files#
Calls GET /files
internally. Prints the list of files added to the server and their details in a tabular format.
Reference:
via_client_cli.py list-files [-h] [--backend BACKEND]
[--print-curl-command]
Example:
python3 via_client_cli.py list-files
Get File Details#
Calls GET /files/{id}
internally. Prints the details of the file.
Reference:
via_client_cli.py file-info [-h] [--backend BACKEND]
[--print-curl-command] file_id
Example:
python3 via_client_cli.py file-info
7ce1127a-2009-4bfa-bdf8-efa9e1f37fa4
Get File Contents#
Calls GET /files/{id}/content
internally. Saves the content to a new file.
Reference:
via_client_cli.py file-content [-h] [--backend BACKEND]
[--print-curl-command] file_id
Example:
python3 via_client_cli.py file-content
7ce1127a-2009-4bfa-bdf8-efa9e1f37fa4
Delete File#
Calls DELETE /files/{id}
internally. Prints the delete status and file details.
Reference:
via_client_cli.py delete-file [-h] [--backend BACKEND]
[--print-curl-command] file_id
Example:
python3 via_client_cli.py delete-file
7ce1127a-2009-4bfa-bdf8-efa9e1f37fa4
Live Stream Commands#
Add Live Stream#
Calls POST /live-stream
internally. Prints the live-stream id if it is added successfully.
Reference:
via_client_cli.py add-live-stream [-h] [--description DESCRIPTION]
[--username USERNAME] [--password PASSWORD] [--backend BACKEND]
[--print-curl-command]
live_stream_url
Example:
python3 via_client_cli.py add-live-stream --description "Some live stream description" \
rtsp://192.168.1.100:8554/video/media1
List Live Streams#
Calls GET /live-stream
internally. Prints the list of live-streams and their details in a tabular format.
Reference:
via_client_cli.py list-live-streams [-h] [--backend BACKEND]
[--print-curl-command]
Example:
python3 via_client_cli.py list-live-streams
Delete Live Stream#
Calls DELETE /live-stream/{id}
internally. Prints the status confirming deletion of the live stream.
Reference:
via_client_cli.py delete-live-stream [-h] [--backend BACKEND][--print-curl-command]video_id
Example:
python3 via_client_cli.py delete-live-stream
ea071500-3a47-4e6f-87da-1bc796075344
Models Commands#
List Models#
Calls GET /models
internally. Prints the list of models loaded by the server and their details in a tabular format.
Reference:
via_client_cli.py list-models [-h] [--backend BACKEND][--print-curl-command]
Example:
python3 via_client_cli.py list-models
Summarization Command#
Calls POST /summarize
internally. Triggers summarization on a file or
live-stream and blocks it until summarization is complete or you
interrupt the process.
The command allows some configurable parameters with the summarize request.
For files, results are available after the entire file is summarized. The command then prints the results.
For live-streams, results are periodically available. The period depends on the chunk_duration and summary_duration. Interrupting the summarize command does not stop the summarization on the server side. You can re-connect to the live-stream by re-running the summarize command with the same id as the live-stream.
Multiple image files can be summarized together by specifying --id <image-file-id>
multiple times. This works only for image files.
To enable later chat or Q&A based on the prompts in the current summarize API
call, add --enable-chat
.
To enable audio transcription and use of audio transcripts in summarization,
and later Q&A if enabled, add --enable-audio
.
To enable CV metadata generation and usage add --enable-cv-metadata
.
To provide custom prompt to the object detector in CV pipeline, add --cv-pipeline-prompt <prompt>
.
For example to detect persons and forklifts add --cv-pipeline-prompt "person . forklift ."
.
To enable alerts for live-streams and files using Server-Sent Events, specify the events
to be alerted for using --alert <alert_name>:<event1>,<event2>,...
. For
example --alert safety-issues:workers not wearing ppe,boxes falling
. This
argument can be specified multiple times for multiple alerts.
For more details on each argument, refer to the help command and the VSS
API reference. Detailed VSS API documentation is available at http://<VSS_API_ENDPOINT>/docs
after VSS is deployed.
Reference:
via_client_cli.py summarize [-h] --id ID --model MODEL
[--stream]
[--chunk-duration CHUNK_DURATION]
[--chunk-overlap-duration CHUNK_OVERLAP_DURATION]
[--summary-duration SUMMARY_DURATION]
[--prompt PROMPT]
[--system-prompt SYSTEM_PROMPT]
[--caption-summarization-prompt CAPTION_SUMMARIZATION_PROMPT]
[--summary-aggregation-prompt SUMMARY_AGGREGATION_PROMPT]
[--file-start-offset FILE_START_OFFSET]
[--file-end-offset FILE_END_OFFSET]
[--model-temperature MODEL_TEMPERATURE]
[--model-top-p MODEL_TOP_P]
[--model-top-k MODEL_TOP_K]
[--model-max-tokens MODEL_MAX_TOKENS]
[--model-seed MODEL_SEED]
[--alert ALERT]
[--enable-chat]
[--enable-audio]
[--enable-cv-metadata]
[--cv-pipeline-prompt CV_PIPELINE_PROMPT]
[--response-format {json_object,text}]
[--backend BACKEND]
[--print-curl-command]
[--summarize-batch-size SUMMARIZE_BATCH_SIZE]
[--rag-batch-size RAG_BATCH_SIZE]
[--rag-top-k RAG_TOP_K]
[--summarize-top-p SUMMARIZE_TOP_P]
[--summarize-temperature SUMMARIZE_TEMPERATURE]
[--summarize-max-tokens SUMMARIZE_MAX_TOKENS]
[--chat-top-p CHAT_TOP_P]
[--chat-temperature CHAT_TEMPERATURE]
[--chat-max-tokens CHAT_MAX_TOKENS]
[--notification-top-p NOTIFICATION_TOP_P]
[--notification-temperature NOTIFICATION_TEMPERATURE]
[--notification-max-tokens NOTIFICATION_MAX_TOKENS]
Example:
python3 via_client_cli.py summarize \
--id ea071500-3a47-4e6f-87da-1bc796075344 \
--model gpt-4o \
--chunk-duration 60 \
--stream \
--prompt "Write a dense caption about the video containing events like ..." \
--model-temperature 0.8
python3 via_client_cli.py summarize \
--id ea071500-3a47-4e6f-87da-1bc796075344 \
--model gpt-4o \
--chunk-duration 10
python3 via_client_cli.py summarize \
--id ea071500-3a47-4e6f-87da-1bc796075344 \
--model gpt-4o \
--chunk-duration 60 \
--prompt "Write a dense caption about the video containing events like ..." \
--caption-summarization-prompt "Summarize the events in the video using provided video description and audio transcripts. ..." \
--summary-aggregation-prompt "Given the video descriptions and audio transcripts, aggregate them to a concise summary with timestamps. ..." \
--enable-chat \
--enable-audio
python3 via_client_cli.py summarize \
--id ea071500-3a47-4e6f-87da-1bc796075344 \
--model gpt-4o \
--chunk-duration 60 \
--enable-cv-metadata \
--cv-pipeline-prompt "person . forklift ."
Chat Command#
Calls POST /chat/completions
internally. Triggers a Q&A query on a file or
live-stream and blocks it until results are received.
The command allows sending the prompt (question) along with some configurable parameters for the Q&A request.
For the chat command to work, a summarize query with --enable-chat
must already
have completed,
For more details on each argument, refer to the help command and the VSS
API reference. Detailed VSS API documentation is available at http://<VSS_API_ENDPOINT>/docs
after VSS is deployed.
Reference:
via_client_cli.py chat [-h] --id ID --model MODEL
[--stream]
[--prompt PROMPT]
[--file-start-offset FILE_START_OFFSET]
[--file-end-offset FILE_END_OFFSET]
[--model-temperature MODEL_TEMPERATURE]
[--model-top-p MODEL_TOP_P]
[--model-top-k MODEL_TOP_K]
[--model-max-tokens MODEL_MAX_TOKENS]
[--model-seed MODEL_SEED]
[--response-format {json_object,text}]
[--backend BACKEND] [--print-curl-command]
Example:
python3 via_client_cli.py chat \
--id d0b997f6-869a-4b2b-aee4-ea92d204d6bf \
--model vila-1.5 \
--prompt "Is there a person wearing white shirt?"
Alerts Commands#
Add Live Stream Alert#
Call POST /alerts
internally. Adds an alert callback for a live-stream based
on currently running summarization prompts. Prints the alert ID if added
successfully.
Whenever an alert is detected, VSS will POST
a request to the configured
URL with alert details.
Reference:
via_client_cli.py add-alert [-h] --live-stream-id LIVE_STREAM_ID
--callback-url CALLBACK_URL --events EVENTS
[--callback-json-template CALLBACK_JSON_TEMPLATE]
[--callback-token CALLBACK_TOKEN]
[--backend BACKEND] [--print-curl-command]
Example:
python3 via_client_cli.py add-alert \
--live-stream-id ea071500-3a47-4e6f-87da-1bc796075344 \
--callback-url http://localhost:14000/via-alert-callback \
--events "worker not wearing ppe" --events "boxes falling"
Refer to Alert Callback Test Server for starting a test alert callback server.
List Live Stream Alerts#
Call GET /alerts
internally. Get the list of all live-stream alerts that
have been configured.
Reference:
via_client_cli.py list-alerts [-h] [--backend BACKEND] [--print-curl-command]
Example:
python3 via_client_cli.py list-alerts
Delete Live Stream Alert#
Call DELETE /alerts/{id}
internally. Delete a live-stream alert using its ID.
Reference:
via_client_cli.py delete-alert [-h] [--backend BACKEND] [--print-curl-command] alert_id
Example:
python3 via_client_cli.py delete-alert f82b2c8b-e5ce-49d7-be7e-053717d0cd49
List Recent Live Stream Alerts#
Call GET /alerts
internally. Get the list of recent live-stream alerts that
were detected.
Reference:
via_client_cli.py list-recent-alerts [-h] [--live-stream-id LIVE_STREAM_ID]
[--backend BACKEND] [--print-curl-command]
Example:
python3 via_client_cli.py list-recent-alerts
Alert Callback Test Server#
Start a test server for receiving live-stream alerts from VSS. The server prints the alerts as received.
Reference:
via_client_cli.py alert-callback-server [-h] [--host HOST] [--port PORT]
Example:
python3 via_client_cli.py alert-callback-server --host 0.0.0.0 --port 5004
Alert Review Commands#
Review Alert#
Calls POST /reviewAlert
internally. Review an external alert using VLM analysis to get more details about the alert and/or determine if the alert is valid based on video content.
This command/API supports generating a dense caption as well as a boolean true/false. The prompt and system prompt must be configured by the user accordingly.
Additionally, do_verification
may be set to true
. When this is set, VSS will look for truthy words like yes
or true
in the VLM response and set verification_result
accordingly.
Reasoning can be requested by setting enable_reasoning
to true
. In this case, system prompt can be optionally modified to request VLM to respond with <think></think>
and <answer></answer>
tags.
If not done explicitly by user, VSS would modify the prompt internally.
Examples for various use cases:
Caption Only:
system_prompt: You are a helpful assistant. Answer the user's question.
prompt: Describe the scene in the video in one line.
do_verification: false
Caption with Boolean Answer:
system_prompt: You are a helpful assistant. Answer the user's question.
prompt: Did a person enter the room? Describe the scene in the video in one line.
do_verification: true
Boolean Answer Only:
system_prompt: You are a helpful assistant. Answer the user's question with a yes or no only.
prompt: Did a person enter the room?
do_verification: true.
If either the video path or the CV metadata path is specified as a relative path,
the environment variable ALERT_REVIEW_MEDIA_BASE_DIR
will be used as the base directory and prepended to those paths.
When using the sample CV pipeline provided in the examples, the CV detection video clips will be generated into this directory after running the samples in Computer Vision Pipeline Manager UI.
Reference:
via_client_cli.py review-alert [-h] [--backend BACKEND] [--print-curl-command]
--video-path VIDEO_PATH --prompt PROMPT --sensor-id SENSOR_ID
--alert-type ALERT_TYPE --alert-description ALERT_DESCRIPTION
--event-type EVENT_TYPE --event-description EVENT_DESCRIPTION
[--version VERSION] [--id ID] [--timestamp TIMESTAMP]
[--confidence CONFIDENCE] [--cv-metadata-path CV_METADATA_PATH]
[--do-verification] [--system-prompt SYSTEM_PROMPT]
[--alert-severity {LOW,MEDIUM,HIGH,CRITICAL}]
[--chunk-duration CHUNK_DURATION]
[--chunk-overlap-duration CHUNK_OVERLAP_DURATION]
[--num-frames-per-chunk NUM_FRAMES_PER_CHUNK]
[--enable-caption] [--cv-metadata-overlay] [--debug]
[--max-tokens MAX_TOKENS] [--temperature TEMPERATURE]
[--top-p TOP_P] [--top-k TOP_K] [--seed SEED]
[--enable-reasoning] [--meta-labels META_LABELS [META_LABELS ...]]
Example:
# Change the --video-path to the correct file name under $ALERT_REVIEW_MEDIA_BASE_DIR
python3 via_client_cli.py review-alert \
--video-path "warehouse_ladder_safety_1080p_2025-09-05T18-47-18.617436Z_st_3.100_end_28.133_clip_1.mp4" \
--prompt "Is anyone on the ladder without a hardhat and safety vest?" \
--system-prompt "You are a helpful assistant. Answer the user's question in yes or no along with a one line description." \
--sensor-id "camera-001" \
--alert-type "RESTRICTED_ACCESS" \
--alert-description "Person detected" \
--event-type "person_detected" \
--event-description "Person detected" \
--alert-severity "HIGH" \
--do-verification \
--meta-labels "location:warehouse_entrance"
# Output
Alert review completed:
Request ID: 4e09734b-4d19-4242-bf3b-927dcf1b2499
Review Status: SUCCESS
Reviewed By: cosmos-reason1
Reviewed At: 2025-09-05T19:08:24Z
Review Verification: False
================================================================================
REASONING
================================================================================
No reasoning available
================================================================================
Description: No, everyone on the ladder is wearing both a hard hat and a safety vest as per the video.
The individual enters the frame from the right, walks towards the ladder, climbs it while holding boxes,
and begins placing them on the top shelf. All actions occur within this timeframe, ensuring compliance
with safety protocols at all times.
Note
The video file and CVmetadata file will not be uploaded to the container. Instead, it must be accessible inside the container. A shared mount can be created for this purpose. Ensure that the container has the correct video-path and CVmetadata path pointing to this mounted directory.
The CV metadata file should contain frame-by-frame object detection results in JSON format. Below is a sample structure:
Sample CV Metadata JSON Structure
[
{
"frameHeight": 1080, // Height of the video frame in pixels
"frameNo": 0, // Sequential frame number (0-based index)
"frameWidth": 1920, // Width of the video frame in pixels
"objects": [], // Array of detected objects in this frame (empty if no objects detected)
"timestamp": 0 // Frame timestamp in nanoseconds from video start
},
{
"frameHeight": 1080,
"frameNo": 1,
"frameWidth": 1920,
"objects": [],
"timestamp": 33333333
},
{
"frameHeight": 1080,
"frameNo": 2773,
"frameWidth": 1920,
"objects": [ // Array of detected objects in this frame
{
"bbox": { // Bounding box coordinates of the first detected object
"bY": 129.65, // Bottom Y coordinate (bottom edge of bounding box)
"lX": 137.7, // Left X coordinate (left edge of bounding box)
"rX": 299.0, // Right X coordinate (right edge of bounding box)
"tY": 6.0 // Top Y coordinate (top edge of bounding box)
},
"conf": 0.75, // Tracking confidence score (0.0 to 1.0)
"id": 14, // Unique tracking ID for this object across frames
"misc": [ // Additional metadata for the tracked object. Will be an array in case multiple chunks are fused together.
{
"bbox": {
"bY": 129.66,
"lX": 137.68,
"rX": 299.02,
"tY": 5.99
},
"chId": 1, // Chunk ID for multi-chunk processing
"conf": 0.73, // Tracking confidence within the chunk
"seg": {} // Segmentation data (empty object for this example)
}
],
"type": "vehicle ." // Object type/class (for example, "vehicle", "person", "forklift")
},
{
"bbox": { // Bounding box coordinates of the second detected object
"bY": 410.7,
"lX": 882.75,
"rX": 1140.3,
"tY": 291.1
},
"conf": 0.75,
"id": 13,
"misc": [
{
"bbox": {
"bY": 410.68,
"lX": 882.73,
"rX": 1140.32,
"tY": 291.08
},
"chId": 1,
"conf": 0.76,
"seg": {}
}
],
"type": "vehicle ."
},
{
"bbox": { // Bounding box coordinates of the third detected object
"bY": 343.45,
"lX": 797.1,
"rX": 1035.7,
"tY": 213.9
},
"conf": 0.8,
"id": 12,
"misc": [
{
"bbox": {
"bY": 343.44,
"lX": 797.11,
"rX": 1035.7,
"tY": 213.89
},
"chId": 1,
"conf": 0.81,
"seg": {}
}
],
"type": "vehicle ."
}
],
"timestamp": 92433333333
}
]
VLM Captions Generation Commands#
Generate VLM Captions#
Calls POST /generate_vlm_captions
internally. Generate VLM captions for video files or live streams using Vision Language Models. For live streams, captions are generated in real-time and can be streamed using server-sent events.
Reference:
via_client_cli.py generate-vlm-captions [-h] [--backend BACKEND] [--print-curl-command]
--id ID [ID ...] --model MODEL [--stream] [--chunk-duration CHUNK_DURATION]
[--chunk-overlap-duration CHUNK_OVERLAP_DURATION] [--prompt PROMPT]
[--system-prompt SYSTEM_PROMPT]
[--file-start-offset FILE_START_OFFSET]
[--file-end-offset FILE_END_OFFSET]
[--model-temperature MODEL_TEMPERATURE] [--model-top-p MODEL_TOP_P]
[--model-top-k MODEL_TOP_K] [--model-max-tokens MODEL_MAX_TOKENS]
[--model-seed MODEL_SEED]
[--response-format {json_object,text}]
[--enable-cv-metadata]
[--cv-pipeline-prompt CV_PIPELINE_PROMPT]
[--num-frames-per-chunk NUM_FRAMES_PER_CHUNK]
[--vlm-input-width VLM_INPUT_WIDTH]
[--vlm-input-height VLM_INPUT_HEIGHT]
[--enable-reasoning]
Example for video file:
python3 via_client_cli.py generate-vlm-captions \
--id 0d975eca-64f8-4c64-a4d7-a49ea834b3ee \
--model cosmos-reason1 \
--prompt "Write a concise and clear dense caption for the provided warehouse video" \
--system-prompt "You are a helpful assistant. Answer the user's question." \
--chunk-duration 60 \
--enable-reasoning
# Output
Request ID: e35bfb7c-d856-4378-a2d1-dfc21a555faf
Request Creation Time: 2025-03-16 10:30:08
Model: cosmos-reason1
Note: VLM Captions generate raw chunk responses from the VLM model (not summaries)
Media start offset: 00:00
Media end offset: 49:44
Chunks processed: 50
Processing Time: 132 seconds
Raw VLM Caption Responses (by chunk):
+---------+--------------+------------+----------------------------------------------------+----------------------------------------------------+
| Chunk | Start Time | End Time | Raw Caption | Reasoning |
+=========+==============+============+====================================================+====================================================+
| 1 | 0 | 60 | The video depicts a bustling warehouse scene where | Okay, let's break down the user's task. They want |
| | | | three individuals wearing yellow safety vests | a concise and clear dense caption for a |
| | | | and... | warehouse... |
+---------+--------------+------------+----------------------------------------------------+----------------------------------------------------+
| 2 | 60 | 120 | The video depicts a sequence in a warehouse | Okay, let's try to figure out what's happening |
| | | | environment featuring a worker operating a **red | here. The video shows a warehouse scene where |
| | | | forklif... | worker... |
+---------+--------------+------------+----------------------------------------------------+----------------------------------------------------+
Example for live stream:
python3 via_client_cli.py generate-vlm-captions \
--id de0e04bc-b007-4967-9cad-28f48a02c126 \
--model cosmos-reason1 \
--prompt "Write a concise and clear dense caption for the provided warehouse live stream" \
--system-prompt "You are a helpful assistant. Answer the user's question." \
--chunk-duration 60 \
--enable-reasoning \
--stream
# Output
Request ID: e35bfb7c-d856-4378-a2d1-dfc21a555faf
Request Creation Time: 2025-03-16 10:30:08
Model: cosmos-reason1
Note: VLM Captions generate raw chunk responses from the VLM model (not summaries)
Media start timestamp: 2025-08-20T06:23:26.120Z
Media end timestamp: 2025-08-20T06:24:26.120Z
Raw VLM Caption Response:
[06:23:26 - 06:24:26] The video depicts a bustling warehouse scene where three workers in safety gear collaborate on logistical tasks. Initially, they stand beside shelves loaded with cardboard boxes and wooden crates. One worker near a large wooden crate remains stationary, while another walks toward a red pallet jack. Meanwhile, a third worker operates a red forklift truck, which begins stationary but gradually advances forward, navigating through the aisle. As the sequence progresses, the forklift operator maneuvers closer to the camera, demonstrating active engagement with the machinery. Throughout the clip, subtle movements from the other two workers suggest coordination in preparing to load/unload items onto the pallet jack. The overall atmosphere reflects organized industrial activity within a spacious, well-lit facility.
Reasoning:
Okay, let's break down the user's query step by step. They want a concise and clear dense caption for a warehouse live stream video. The key elements are to describe what happens from start to end within that timeframe.
Server Health and Metrics Commands#
Server Health Check#
Calls GET /health/ready
internally. Checks the response status code
and prints the server health status.
Reference:
via_client_cli.py server-health-check [-h] [--backend BACKEND][--print-curl-command]
Example:
python3 via_client_cli.py server-health-check
Server Metrics#
Calls GET /metrics
internally. Prints the server metrics. The metrics
are in Prometheus format.
Reference:
via_client_cli.py server-metrics [-h] [--backend BACKEND][--print-curl-command]
Example:
python3 via_client_cli.py server-metrics