FAQ#

Overview#

This page contains frequently asked questions and answers to those questions. In some cases, steps for debugging and solving issues are provided. Any questions that are not addressed on this page or Known Issues should be brought up on the official forum.

Prerequisite FAQs

Deployment FAQs

Prerequisite FAQs#

Failed to fetch blueprint: 403 Forbidden#

Error

Error: failed to fetch https://helm.ngc.nvidia.com/nvidia/blueprint/charts/nvidia-blueprint-vss-2.0.0.tgz : 403 Forbidden

This can occur for multiple reasons, most commonly:

  • The account does not have VSS EA enablement

  • The wrong NGC API key was used

To debug this issue, first ensure you followed the steps for setting up a new account with EA enablement:

  1. When you were approved into the program, you received two emails: 1) Confirming your acceptance into the VSS EA program and 2) “Welcome to NVIDIA NGC”.

  2. Click on the link in the “Welcome to NVIDIA NGC” email.

    NGC Signup Email

    Note

    If you get an INVALID_REQUEST status when clicking on the link, that means you have already created a new NGC account with enablement and must select that account when logging in.

  3. After clicking the link, you are brought to an accounts page where you must select Create New NVIDIA Cloud Account, as this specific Cloud Account will be the one with the VSS EA enablement. Do not click an existing account!

    NGC Signup Email

    Note

    If you click on an existing account with “owner” access, you will get an INTERNAL_ERROR.

  4. Select a distinct name for your cloud account that will be associated with your VSS enablement.

  5. You should now be able to follow the steps to Obtain NGC API Key.

If you followed the above steps and still receive the 403 Forbidden error, please check that:

  1. You have the correct Organization/Team selected in the top right.

  2. The selected Organization/Team has the NVIDIA VSS Early Access subscription and it is active ( Organzation>Subscriptions )

1-click deploy package deploy-aws-cns.tar.gz untar failure#

Error

tar: This does not look like a tar archive

1-click deploy package deploy-aws-cns.tar.gz untar failure

This can occur for multiple reasons, most commonly:

  • The tar file was not downloaded correctly

  • The tar file is corrupted

  • The token used to download the file does not have the necessary permissions

If user do not have access to NGC org: nvidia, teams: blueprint, the download although succeeds, but the untar step will fail.

To debug this issue, first ensure you have access to NGC org: nvidia, teams: blueprint.

Deployment FAQs#

How do I monitor progress of VSS helm chart deployment?#

Use microk8s kubectl get po -A and microk8s kubectl describe po commands to see progress of VSS deployment.

Check Default Deployment Topology and Models in Use to see the names of pods involved in VSS deployment.

You can use microk8s kubectl describe po POD_NAME to individually check status while its initializing for each pod.

The unique Pod name can be found using microk8s kubectl get po -A.

What are some of the common errors users may encounter trying to deploy VSS?#

  • Insufficient VRAM on deployment GPUs or insufficient CPU RAM.

Check Prerequisites for more info on exact memory requirement.

  • OpenAI API Key not having GPT-4o model access.

Make sure you have access to GPT-4o model API endpoint at https://platform.openai.com/apps.

Make sure you have enough credits available at Settings > Usage and be educated on rate limits at Settings > Limits. https://platform.openai.com/settings/organization/usage

  • Incorrect version of NVIDIA Drivers.

Check Prerequisites for more info on exact NVIDIA driver requirement and link to download driver.

GPU operator issues were observed with newer driver versions.

  • GPU Operator: NVIDIA CUDA validator error: Failed to allocate device vector

sudo microk8s kubectl logs -n gpu-operator-resources nvidia-cuda-validator-* -c cuda-validation
Failed to allocate device vector A (error code system not yet initialized)!
[Vector addition of 50000 elements]

User may need to manually install NVIDIA fabric manager. This is documented in Install the NVIDIA Fabric Manager.

Note

Please re-run gpu operator with sudo microk8s enable nvidia force-system-driver to solve this issue if it persist after installing NVIDIA fabric manager.

  • Invalid access to NGC.

Make sure you provide valid NGC API Key by setting up secrets as mentioned in Create Required Secrets.

Use microk8s kubectl get po -A and microk8s kubectl describe po commands to see progress of VSS deployment.

Live (RTSP) stream connect error#

Potential error users can encounter in VSS when trying to connect RTSP stream: ERROR Could not connect to the RTSP URL or there is no video stream from the RTSP URL.

This can be a network access issue.

The rtsp server might not be accessible from inside the VSS container running in k8s.

To debug and confirm RTSP server is accessible from inside the k8s environment,

  1. Launch a ubuntu pod in k8s OR use VSS container

#this will start a ubuntu container in k8s and exec into
sudo microk8s kubectl run ubuntu-debug --image=ubuntu --restart=Never -i --tty -- /bin/bash
  1. Install ffprobe and ffprobe <rtsp>.

#Then install ffprobe and try connecting to the camera from inside the container
apt-get update
apt-get install -y ffmpeg
ffprobe <camera_rtsp_stream_url>

This command will attempt to connect to the camera and provide information about the stream.

3. If step (2) prints relevant stream information, it confirms necessary permissions and network access to connect to the camera from within the container. Otherwise, user may need to check for potential issues like network connectivity, firewall settings, RTSP URL format, server status, codec support, DNS resolution, container permissions, server load, etc.

Where to see VSS logs?#

VSS logs are written to the /tmp/via-logs/ directory inside the running VSS container in the pod: vss-vss-deployment-*.

Refer to VSS Observability for more information on VSS observability.

Why do I see VSS deployment pod restart?#

If for any reason VSS pod errors out, it should restart and try to self correct.

One reason for this is an LLM Exception error showing VSS exceeded max-retries internally trying to connect to the LLM pod.

More details on this in Known Issues.

If this happens, please wait for additional few minutes and observe if a restart fixes the issue.

Users can monitor this using sudo watch microk8s kubectl get pod.

Why do I see “CUDA out of memory” while running summarization?#

This can happen if the GPU does not have enough memory to run the summarization.

The VLM_BATCH_SIZE is automatically determined based on the available GPU memory if it is not explicitly set. Consider setting a lower VLM_BATCH_SIZE manually on the VSS pod to avoid OOM errors.

Live stream preview is not working / Set-of-Marks preview for files or live-streams is not working#

This requires additional proprietary codecs to be installed in the container. Refer to Custom Container Image with Codecs Installed or Installing additional codecs.

Why do I see “Couldn’t Produce Highlights. Please try again.” while generating (scenario) highlights?#

This can happen if the LLM fails to return a valid json response. Please retry highlight generation again.

Why do I see “Sorry, I don’t see that in the video.” during QnA?#

This can happen if there is not enough context to answer the question. Users need to fine-tune the prompts. Refer to Tuning Prompts for more details. This could also happen if there are a lot of stale video files and the graph . Users should ensure that they delete the video file if they are not using it.

Multiple streams are failing due to OSError: [Errno 28] No space left on device#

This happens because the default shared memory size of the VSS pod is 64MB. To fix this, users need to increase the shared memory size of the VSS pod.

This can be set in the overrides file as shown below:

vss:
 applicationSpecs:
   vss-deployment:
     containers:
       vss:
         volumeMounts:
         - name: shm-volume
            mountPath: /dev/shm

     volumes:
     - name: shm-volume
       emptyDir:
         medium: Memory
         sizeLimit: 500Mi

For the docker compose deployment, it can be set in the compose.yaml file as shown below:

services:
   via-server:
   shm_size: 500mb

When triggering summarization for a large number of files back to back, I see VSS pod restarts or connection errors.#

This can happen if Guardrails takes longer time to validate the summarization request.

To fix this, users can either increase the VSS pod probe timeouts as shown below or Disable Guardrails if not required.

sudo microk8s helm install ... \
    --set "vss.applicationSpecs.vss-deployment.containers.vss.livenessProbe.timeoutSeconds=5" \
    --set "vss.applicationSpecs.vss-deployment.containers.vss.readinessProbe.timeoutSeconds=5"

When using remote endpoints with GraphRAG, I am seeing the Exception: [429] Too Many Requests error:#

Error

Exception: [429] Too Many Requests

This can happen if the remote endpoint is not able to handle the request. You can update the ca_rag_config.yaml file to limit the number of requests in parallel to the remote endpoint:

chat:
   rag: graph-rag
   params:
      embedding_parallel_count: 50

How do I view the per-chunk dense captions generated by the VLM?#

There are a few ways to retrieve the per-chunk dense captions generated by the VLM:

  1. Start VSS with VSS_LOG_LEVEL=DEBUG. This will print the per-chunk dense captions in the VSS logs. Look for VLM response generated for in the logs.

  2. Start VSS with CA-RAG disabled. This will disable all RAG features like summarization, alerts and chat. With this, the /summarize API will return the per-chunk dense captions instead of the summary. For the Gradio UI, make sure to uncheck the Enable Chat checkbox. For API usage, make sure to use the streaming mode.

  3. Start VSS with Health Evaluation enabled. For each summarize request, a file /tmp/via-logs/vlm_testdata_<request-id>.txt containing the per-chunk dense captions will be created inside the VSS container.

How do I setup a sample RTSP stream to test with VSS?#

A sample RTSP stream can be created from a video file using cvlc using the following command:

sudo apt install -y vlc
cvlc --loop <video-file> ":sout=#gather:rtp{sdp=rtsp://:8554/file-stream}" :network-caching=1500 :sout-all :sout-keep

This will start the RTSP stream on rtsp://<NODE_IP>:8554/file-stream.

Note

VLC supports only UDP protocol for RTSP streaming. UDP may not work if the VLC server is started on another node than the VSS pod. It is recommended to start VLC server on the same node as the VSS pod.

When do I need NVIDIA Fabric Manager in the deployment node(s)?#

  1. When REQUIRED: NVSwitch-Based Systems

  • Fabric Manager is essential for systems using NVSwitch hardware to create a unified memory fabric across multiple GPUs. This applies to:

  • NVIDIA DGX and HGX server platforms

  1. When NOT Required:

  • Systems using NVLink bridges without NVSwitches

  • Single-GPU configurations

  • Traditional PCIe-connected GPU setups

  • For non-NVSwitch systems using basic NVLink bridges between GPUs.

More info here.

Local LLM deployment fails with “Detected 0 compatible profile” error#

This can happen if the GPU does not have enough memory to run the LLM or the LLM NIM does not have a compatible profile for the GPU.

To fix this, check list of model profiles supported by the LLM NIM using the “list-model-profiles” command as specified in the LLM NIM utilities documentation.

More information regarding various model profiles can be found in the LLM NIM model profiles documentation.

How do I assign fixed ports to VSS API and UI services in case of helm deployment?#

By default, VSS API and UI services are assigned random node ports.

To assign fixed ports, users can pass the following parameters to the helm install command:

sudo microk8s helm install ... \
    --set-json 'vss.applicationSpecs.vss-deployment.services.vss-service.ports=[{"name":"http-api","port":8000, "nodePort": 31000},{"name":"webui","port":9000, "nodePort": 32000}]'

The VSS API service will be available at http://<NODE_IP>:31000 and the VSS UI service will be available at http://<NODE_IP>:32000.

Why do I see error “Server is already processing maximum number of live streams (256)”?#

By default, VSS is configured to allow only 256 summarization requests to be processed in parallel.

To configure a higher limit:

  • For docker compose deployment, add VSS_EXTRA_ARGS="--max-live-streams <max-streams>" in the .env file.

  • For helm deployment, set VSS_EXTRA_ARGS to "--max-live-streams <max-streams>" using the overrides file as shown in Configuration Options.