FAQ#

Overview#

This page contains frequently asked questions and answers to those questions. In some cases, steps for debugging and solving issues are provided. Any questions that are not addressed on this page or known issues can be brought up on the official forum.

VIOS

Long Video Summarization MS

Local LLM and VLM deployments on OTHER hardware

Remote VLM (``nim`` model-type)

VIOS#

Why do large file uploads to VIOS fail?#

Upload size is governed by two settings (NGINX client_max_body_size and nv_streamer_max_upload_file_size_MB). The effective limit is the lower of the two. Refer to Configuring upload file size limit for configuration steps.

Why are video uploads to VIOS slow?#

Slow uploads are commonly caused by VPN overhead. Upload from a machine on the same local network as the VIOS deployment when possible.

How do I re-encode a video and disable B-frames before uploading to VIOS?#

Videos with B-frames or incompatible encoding settings may fail to play back or stream correctly. Re-encode with B-frames disabled and a fixed keyframe interval before uploading. Refer to Synchronize Streaming of Videos for the recommended ffmpeg command.

Long Video Summarization (LVS) MS#

Can the LVS MS process multiple videos simultaneously?#

No, the LVS MS processes one video at a time to ensure optimal GPU utilization. Use a queue system for batch processing.

What video formats are supported?#

The LVS MS supports common formats: MP4, AVI, MOV, MKV, WebM. Install proprietary codecs via INSTALL_PROPRIETARY_CODECS=true for additional formats.

How do I use a custom VLM model in the LVS MS?#

Set VLM_MODEL_TO_USE and provide model path via MODEL_ROOT_DIR volume mount.

How can I change the VLM prompt for summarization?#

You can customize the VLM prompt by using the following flags in your API request: override_vlm_prompt and prompt. Here is an example of how to use them in a curl command:

curl --location 'http://localhost:38111/summarize' \
  --header 'Content-Type: application/json' \
  --data '{
    "url": "<video url>",
    "model": "<model name>",
    "events": [
      <event list>
    ],
    "scenario": "<scenario>",
    "override_vlm_prompt": true,
    "prompt": "<Your prompt goes here>\n\nProvide the result in JSON format with \"seconds\" for time depiction for each event.\nUse the following keywords in the JSON output: '\''start_time'\'', '\''end_time'\'', '\''description'\'', \"type\".\nThe \"type\" field should correspond to an event type from the event list.\n\nExample output format:\n{\n  \"start_time\": t_start,\n  \"end_time\": t_end,\n  \"description\": \"EVENT1\",\n  \"type\": \"event_type from the event list\"\n}\n\nMake sure the answer contains correct timestamps."
  }'

Replace <Your prompt goes here> and <event list> with your custom values as needed.

You need to keep the output format as is for the VLM to generate the correct output that can be processed by the downstream pipeline.

Remote VLM (nim model-type) and video access#

How does a remote nim VLM access videos?#

When you use a remote VLM of model-type nim (not openai), that VLM runs elsewhere and must connect back to your host on port 30888 to fetch videos. The VLM must be able to reach that port on:

  • The external IP passed via -e, if you provide one when running the agent workflow, or

  • Otherwise, the internal host IP you pass via -i or the auto-derived internal host IP.

Ensure your network and firewall allow the nim VLM to access port 30888 on the appropriate IP from where the VLM runs.

Local LLM and VLM deployments on OTHER hardware#

NVIDIA A100 (80 GB)#

Can I run local LLM and VLM on NVIDIA A100 80 GB GPUs?#

Yes. The default models (nvidia/nvidia-nemotron-nano-9b-v2 for LLM and nvidia/cosmos-reason2-8b for VLM) have been verified to work on dedicated A100 (80 GB) GPUs each with no environment overrides. There are no tested overrides to make these defaults work on a shared GPU.

NVIDIA H200#

Can I run local LLM and VLM on NVIDIA H200 GPUs?#

Yes. The H200 is supported for local LLM and VLM deployments. The default LLM and VLM will work in shared mode if the override environment variables include the following:

LLM:

  • NIM_KVCACHE_PERCENT=0.4

  • NIM_MAX_NUM_SEQS=4

  • NIM_MAX_MODEL_LEN=128000

  • NIM_LOW_MEMORY_MODE=1

VLM:

  • NIM_KVCACHE_PERCENT=0.4

  • NIM_MAX_MODEL_LEN=32768

  • NIM_MAX_NUM_SEQS=4

  • MAX_JOBS=4

  • NIM_DISABLE_MM_PREPROCESSOR_CACHE=1