VSS Observability#

VSS Observability collects useful runtime metrics, like latencies and resource usage.

Note

This is an alpha feature. Future releases may have better observability and tracing. Current observability logs might be deprecated in future releases. The logs are written into non persistent files inside the container. You must use tools like kubectl to copy the data out of the container.

VSS Log Files#

VSS logs are written to /tmp/via-logs/ directory inside the running VSS container in the pod: vss-vss-deployment-*.

Main files to look for:

Log File Path

Description

/tmp/via-logs/via_engine.log

VSS engine logs pertaining to Ingestion pipeline

/tmp/via-logs/via_ctx_rag.log

VSS CA RAG logs pertaining to Retrieval pipeline

VSS Health Evaluation Reports#

ENABLE_VIA_HEALTH_EVAL config, if enabled, dumps useful latency and GPU usage metrics to files within the container. Future releases may have these metrics and additional traces available over API endpoints. This feature is made available for gaining insights on VSS component latencies and GPU resource usage. In the following, the UUID <unique_request_id> represents the unique Request ID for a single /summarize REST API call into VSS.

Log File Path

Description

/tmp/via-logs/via_health_summary_<unique_request_id>.json

This file will have major VSS configuration info and VSS component latencies captured for a single /summarize REST API call

/tmp/via-logs/vlm_testdata_<unique_request_id>.txt

This file will have dense caption output from VLM for each chunk processed

/tmp/via-logs/summ_testdata_<unique_request_id>.txt

This file will have the summary output for the /summarize REST API call

/tmp/via-logs/via_gpu_usage_<unique_request_id>.csv /tmp/via-logs/via_plot_gpu_<unique_request_id>.png /tmp/via-logs/via_plot_gpu_mem_<unique_request_id>.png

The GPU usage and GPU Memory usage values plotted over time (from /summarize REST API call to the time response summary is generated and shared) The plotted values are available in the CSV file and are plotted to PNG files for quick reference

/tmp/via-logs/via_nvdec_usage_<unique_request_id>.csv /tmp/via-logs/via_plot_nvdec_<unique_request_id>.png

The NVDEC usage for video decode is plotted over time (from /summarize REST API call to the time response summary is generated and shared)

To enable ENABLE_VIA_HEALTH_EVAL, add the following to the Helm overrides file, discussed in Configuration Options. To disable, remove ENABLE_VIA_HEALTH_EVAL from the env list in overrides.

vss:
  applicationSpecs:
    vss-deployment:
      containers:
        vss:
          env:
          - name: ENABLE_VIA_HEALTH_EVAL
            value: "true"