Release Notes#
These Release Note describe the key features, software enhancements and improvements, and known issues for the VSS release product package.
VSS 2.3.0#
These are the VSS 2.3.0 Release Notes.
Key Features and Enhancements#
Support for Audio in Summarization and Q&A
Support for preprocessing a video to generate Set-Of-Marks (SOM) prompting and additional CV metadata for better accuracy
Multi-stream support for Q&A
Gradio UI Improvements
Additional runtime parameters that can be configured through the
/summarize
APIsummarize_top_p, summarize_temperature, summarize_max_tokens
LLM Sampling parameters for summarization.
chat_top_p, chat_temperature, chat_max_tokens
LLM Sampling parameters for Q&A .
notification_top_p, notification_temperature, notification_max_tokens
LLM Sampling parameters for alerts/event detection.
More info here: API Documentation.
New API
/alerts/recent
to get recent alerts for all live streams.Stability improvements
Single GPU Deployment
VSS 2.2.0#
These are the VSS 2.2.0 Release Notes. This release is an Engineering Release to introduce some of the new features. This release includes several fixes from the previous VSS releases and additional changes.
Key Features and Enhancements#
Enhanced multi-stream / concurrent mode support
GraphRAG performance improvements.
Support for NVILA research model. More info here: Configuring for NVILA model.
Additional runtime parameters that can be configured through the
/summarize
APIvlm_input_width, vlm_input_height
Configure the input resolution of the frames to the VLM
num_frames_per_chunk
Configure the number of frames to sample from each chunk
summarize_batch_size
LLM Batch Size for summarization.
rag_type
Choose between “graph-rag” and “vector-rag”
rag_top_k
Number of top rerank results to use during Q&A
rag_batch_size
Number of VLM captions to be batched together for creating graph
More info here: API Documentation.
Compatibility#
The TensorRT version in VSS container has been upgraded requiring new TensorRT engines to be built for the VILA-1.5 model.
Make sure to remove any stale TensorRT engines for VILA-1.5.
For helm deployment, this can be done by:
sudo microk8s kubectl delete pvc vss-ngc-model-cache-pvc
Known Issues#
Multi-session Q&A is not currently supported. Users should try chat only on a single file or live stream at a time. Trying chat on multiple files and/or live streams may lead to incorrect replies. This does not affect summarization.
Gradio UI sometimes becomes unresponsive. This can manifest in ways such as live-stream not being deleted even after clicking on the Delete Live Stream button. VSS REST API can be used as an alternative for live stream deletion in this case.
Sometimes, deleting a live stream does not work.
Current VSS release also supports NVILA research model as the VLM. However, optimizations for the model are under development. Currently it only supports FP16 precision.
Models are trained on specific data/use cases so if tested on other inputs then it might give incorrect results.
VLM Model accuracy: Sometimes time stamps returned are not accurate. Also, it can hallucinate for certain questions. Prompt tuning might be required.
Summarization accuracy: Summarization accuracy is heavily dependent on VLM accuracy. Also, the default configs have been tuned for the warehouse use case. User can supply custom VLM and summarization prompts to the
/summarize
API.The following harmless warnings might be seen during VSS application execution. These can be safely ignored.
GLib (gthread-posix.c): Unexpected error from C library during ‘pthread_setspecific’: Invalid argument. Aborting
Due to a browser limitation, loading multiple Gradio sessions in the same browser may cause Gradio sessions to get stuck or appear to be slow.
Guardrails might not reject some prompts that are expected to be rejected. This could be because the prompt might be relevant in other contexts as well as topics in the prompt might not be configured to be rejected. You can try tuning the guardrails configuration if required.
OpenAI connection errors or 429 (too many requests) errors might be seen sometimes if too many requests are sent to GPT-4v or GPT-4o VLMs. It can be due to lower TPM/RPM limits associated with the OpenAI account.
CA-RAG Summarization might show a truncated summary response. This is due to the max_tokens. Try increasing the number in the CA-RAG config file.
Helm deployment: VSS deployment pod fails with Error:
(LLM call Exception: llm-nim-svc)
Inspite of having a init container wait for LLM pod to come up, VSS deployment can for an unknown reason error out like below.
2024-11-27 17:51:44,763 [91mERROR[0m Failed to load VIA stream handler - LLM Call Exception: HTTPConnectionPool(host='llm-nim-svc', port=8000): Max retries exceeded with url: /v1/chat/completions (Caused by NewConnectionError('<urllib3.connection.HTTPConnection object at 0x7f2c9d0ad6c0>: Failed to establish a new connection: [Errno 111] Connection refused'))
If this happens, please wait for additional few minutes and a pod restart fixes the issue.
Users can monitor this using:
sudo watch microk8s kubectl get pod
Gradio UI might be slow to load the thumbnails and video preview for longer videos. This becomes especially noticeable over slower network connections.
Deleting RTSP streams can be hung sometimes. This is because, rtspsrc indefinitely retries TCP transport after UDP timeout when the timeout property is set. Once TCP link is established, pipeline teardown hangs. This is a GSTreamer issue: https://gitlab.freedesktop.org/gstreamer/gstreamer/-/issues/1570. Workaround can be exporting VSS_RTSP_TIMEOUT=0. This will disable TCP transport after UDP timeout. But, this could cause streaming to not work at all when network is not good.