Optimal Topology#

VSS Deployment Topologies#

VSS supports different deployment topologies optimized for various GPU types and performance requirements. The choice of topology depends on your hardware configurations.

Default Topology#

The default topology dedicates 4 GPUs for LLM NIM, 2 GPUs for VSS ingestion and Retrieval pipeline, and 1 GPU each for Nemo embedding and reranking NIMs. This topology is designed for the system where single GPU is not enough to handle mutliple NIMs. e.g. system with L40s GPUs.

For details on the default topology configuration, see Default Deployment Topology and Models in Use.

Shared GPU Topology#

For high-performance GPUs like H100, H200, or A100(80+ GB device memory), There is no need to dedicate individual GPUs to embedding and reranking NIMs. It is recommended to use the GPU-sharing topology for better utilization of GPU resources and better throughput.

For configuration details, see Optional Deployment Topology with GPU sharing.

Note

For optimal performance on H100, H200, or A100 GPUs, always use the GPU-optimized topology. The default topology may not fully utilize the capabilities of these high-performance GPUs.