Text Embedding (Latest)
Microservices

Optimization

NeMo Retriever Text Embedding NIM (Text Embedding NIM) automatically leverages model- and hardware-specific optimizations intended to improve the performance of embedding models.

The NVIDIA TensorRT-accelerated NIM backend provides support for optimized versions of common models across a number of NVIDIA GPUs. If an optimized engine does not exist for a SKU being used, a GPU-agnostic ONNX backend (using the CUDA Execution Provider) is used instead.

The TensorRT NIM backend includes multiple optimization profiles, catered to floating point precision types supported by each SKU.

Text Embedding NIM is designed to automatically select the most suitable profile from the list of compatible profiles based on the detected hardware. Each profile consists of different parameters, which influence the selection process. The sorting logic based on the parameters involved is outlined below:

  • Compatibility check: Text Embedding NIM filters out the profiles that are not runnable with the detected configuration based on the number and type of GPUs available.

  • Backend: This can be either TensorRT or ONNX. The optimized TensorRT profiles are preferred over ONNX when available.

  • Precision: Lower precision profiles are preferred when available. For example, Text Embedding NIM will automatically select FP8 profiles over FP16.

This selection is logged at startup. For example:

Copy
Copied!
            

Detected 3 compatible profile(s). Valid profile: onnx (type=ONNX,precision=FP16) Valid profile: NVIDIA-H100-80GB-HBM3_10.0.1_12_FP8 (type=TensorRT,precision=FP8,gpu=NVIDIA-H100-80GB-HBM3) Valid profile: NVIDIA-H100-80GB-HBM3_10.0.1_12 (type=TensorRT,precision=FP16,gpu=NVIDIA-H100-80GB-HBM3) Selected profile: NVIDIA-H100-80GB-HBM3_10.0.1_12_FP8 Profile metadata: type: TensorRT Profile metadata: precision: FP8 Profile metadata: gpu: NVIDIA-H100-80GB-HBM3 Profile metadata: trt_version: 10.0.1 Profile metadata: cuda_major_version: 12

Overriding Profile Selection

Attention

To override this behavior, set a specific profile ID with -e NIM_MODEL_PROFILE=<value>. The following list-model-profiles command lists the available profiles for the IMG_NAME Text Embedding NIM:

Copy
Copied!
            

docker run --rm --runtime=nvidia --gpus=all $IMG_NAME list-model-profiles MODEL PROFILES - Compatible with system and runnable: - onnx (type=ONNX,precision=FP16) - NVIDIA-H100-80GB-HBM3_10.0.1_12_FP8 (type=TensorRT,precision=FP8,gpu=NVIDIA-H100-80GB-HBM3) - NVIDIA-H100-80GB-HBM3_10.0.1_12 (type=TensorRT,precision=FP16,gpu=NVIDIA-H100-80GB-HBM3)

In the previous example, you can set -e NIM_MODEL_PROFILE="NVIDIA-H100-80GB-HBM3_10.0.1_12" to run the H100 FP16 profile.

Previous Observability
Next Performance
© Copyright © 2024, NVIDIA Corporation. Last updated on Jul 23, 2024.