Model customization overview#

The Video Search and Summarization (VSS) blueprint uses embedding models in two places: object-level embeddings from the Real-Time Computer Vision (RT-CV) microservice for detection crops and text queries, and video-level embeddings from the Real-Time Embedding microservice for clip- and stream-based semantic search. Recent TAO Toolkit releases support fine-tuning the same foundation models that these services use, so you can specialize embeddings for your domain and swap the artifacts into your deployment.

Industry blueprint deployments (for example Warehouse Operations) also ship computer vision perception models (Sparse4D, RT-DETR) that you can fine-tune with TAO and integrate into the Perception microservice. See Sparse4D and RT-DETR.

This section maps which components accept customized weights, where to fine-tune (documentation and NGC model cards), and how to integrate fine-tuned models. For low-level service settings (environment variables, DeepStream config keys, APIs), see the linked microservice guides.

Note

Changing embedding dimension or similarity geometry usually requires re-indexing stored vectors (for example Elasticsearch indices used by search workflows) and revisiting similarity thresholds in analytics. Plan a full pipeline validation after swapping models.

Summary#

VSS component

Embedding role

Supported model families

Primary integration surface

Object Detection and Tracking (RT-CV)

Object crop embeddings and optional text embeddings (ReID, text-to-image alignment)

RADIO-CLIP, SigLIP v2

DeepStream INI: vision encoder + text embedder (ONNX / TensorRT); see ReID and Embeddings on that page

Real-Time Embedding (RT-Embedding)

Video and text embeddings for semantic search and Kafka-published clip features

Cosmos-Embed1

Container environment: MODEL_PATH, optional MODEL_IMPLEMENTATION_PATH / Triton repo scripts; see Customizations on that page

Per-model guides#

Agent and search workflows#

Developer profiles and agent tools that call embedding endpoints or query Elasticsearch (see VSS-Agent-Profiles) assume compatible embedding spaces with the indexed documents. After swapping RT-CV or RT-Embedding models, confirm:

  • API payloads still match expected vector sizes.

  • Search and fusion logic (for example multi-embedding or attribute search) still uses comparable similarity scales.

For LLM and VLM configuration (not embedding fine-tuning), see Configure the LLM and Configure the VLM.