Release Notes for NVIDIA NeMo Retriever Embedding NIM#

This documentation contains the release notes for NVIDIA NeMo Retriever Embedding NIM.

Note

Some releases are labelled “Production Branch” or “(PB)”. Production Branches provide reliable, stable versions of the NIM. Non-production branch releases (sometimes called Feature Branch (FB) releases) contain the latest features, improvements, and optimizations.

Release 1.13.0#

Highlights#

  • Rename existing models to the new Nemotron brand. The impacted models are the following:

    • The llama-3.2-nemoretriever-300m-embed-v2 model is now named llama-nemotron-embed-300m-v2.

    • The llama-3.2-nv-embedqa-1b-v2 model is now named llama-nemotron-embed-1b-v2.

  • Add fixes for high and critical vulnerabilities.

Fixed Known Issues#

The following are the known issues that are fixed in this version:

  • Fixed an issue with the persistence.enabled helm chart value. Persistent storage options (persistence.storageClass, persistence.existingClaim, hostPath.enabled) are now fully functional.

Release 1.12.0#

Highlights#

Known Issues#

  • The persistence.enabled value and all related dependent configuration flags are currently non-functional in the NIM helm chart.

Release 1.11 - Production Branch Only#

This release is a production branch.

Highlights#

Known Issues#

There are no known issues in this release.

Release 1.10.1#

This release is a patch release.

Highlights#

Known Issues#

  • The persistence.enabled value and all related dependent configuration flags are currently non-functional in the NIM helm chart.

Release 1.10.0#

Summary#

Known Issues#

  • The persistence.enabled value and all related dependent configuration flags are currently non-functional in the NIM helm chart.

Release 1.9.0#

Summary#

Known Issues#

  • The persistence.enabled value and all related dependent configuration flags are currently non-functional in the NIM helm chart.

Release 1.8 - Production Branch Only#

Summary#

  • 1.8.0: Added support for H200 NVL GPU for the NV-EmbedQA-E5-v5 NIM. For details, see NV-EmbedQA-E5-v5.

  • 1.8.0: Added FP8 support for H100 and L40s for the NV-EmbedQA-E5-v5 NIM. For details, see NV-EmbedQA-E5-v5.

  • 1.8.1 - 1.8.x: CVE fixes for high & critical vulnerabilities.

Release 1.7.0 - Early Access Only#

Summary#

Known Issues#

  • Currently, only unoptimized generic model profiles are supported.

Release 1.6.0#

Summary#

Known Issues#

  • The list-model-profiles command incorrectly lists compatible model profiles as incompatible. Select the profile that matches your hardware configuration. This bug does not impact automatic profile selection.

  • Slight performance degradation observed since 1.3.1 release.

  • For the B200 GPU, Llama-3.2-NV-EmbedQA-1B-v2 requires NIM_TRT_ENGINE_HOST_CODE_ALLOWED=1 to properly start the NIM.

Summary#

Release 1.5.1#

Summary#

  • Fixed bug where list-model-profiles command fails to run on hosts that don’t have an NVIDIA GPUs, even when NIM_CPU_ONLY is set.

  • Fixed bug where list-model-profiles command returns custom models that should not be used.

Known Issues#

  • The list-model-profiles command incorrectly lists compatible model profiles as incompatible. Select the profile that matches your hardware configuration. This bug does not impact automatic profile selection.

  • Slight performance degradation observed since 1.3.1 release.

Release 1.5.0#

Summary#

  • Added support for bge-m3 embedding model. For details, refer to Support Matrix.

  • Added support for bge-large-zh-v1.5 embedding model.

  • Added the NIM_TRITON_PERFORMANCE_MODE environment variable to allow you to select performance modes that are optimized for low latency or high throughput.

  • Added the NIM_TRITON_MAX_BATCH_SIZE environment variable.

  • Added support for configurable memory footprint by allowing users to set batch size and sequence length.

  • Added support for gRPC.

  • Reduced container image sizes.

  • Removed model profiles for A100 PCIe 40GB & H100 PCIe 80GB configurations.

Known Issues#

  • The list-model-profiles command incorrectly lists compatible model profiles as incompatible. Select the profile that matches your hardware configuration. This bug does not impact automatic profile selection.

  • The list-model-profiles command fails to run on hosts that don’t have an NVIDIA GPUs, even when NIM_CPU_ONLY is set.

  • The list-model-profiles command returns custom models that should not be used.

Release 1.4.0-rtx (Beta)#

Summary#

This is a public beta release of the NVIDIA NeMo Retriever Embedding NIM. This release contains the following changes:

  • Added support for GeForce RTX 4090, NVIDIA RTX 6000 Ada Generation, GeForce RTX 5080, and GeForce RTX 5090 for the Llama-3.2-NV-EmbedQA-1B-v2 NIM.

Known Issues#

  • The list-model-profiles command incorrectly lists compatible model profiles as incompatible. Select the profile that matches your hardware configuration. This bug does not impact automatic profile selection.

Release 1.3.1#

Release 1.3.0#

  • Added support for Llama-3.2-NV-EmbedQA-1B-v2 embedding model.

  • Added support for dynamic embedding sizes via Matryoshka Representation Learning (for supported models).

  • Added NIM_NUM_MODEL_INSTANCES and NIM_NUM_TOKENIZERS environment variables.

  • Added support for dynamic batching in the underlying Triton Inference Server process.

Known Issues#

  • The current version of langchain-nvidia-ai-endpoints used in the LangChain playbook is not compatible with the Llama-3.2-NV-EmbedQA-1B-v2 NIM.

Release 1.2.0#

  • Updated NV-EmbedQA-E5-v5 NIM to use Triton Inference Server 24.08.

  • Added the NIM_TRITON_GRPC_PORT env var to set gRPC port for Triton Inference Server.

Release 1.1.0#

  • Updated NV-EmbedQA-E5-v5 NIM using standard NIM library and tools.

Release 1.0.1#

  • Added support for NGC Personal/Service API keys in addition to the NGC API Key (Original).

  • NGC_API_KEY is no longer required when running a container with a pre-populated cache (NIM_CACHE_PATH).

  • list-model-profiles command updated to check the correct location for model artifacts.

Release 1.0.0#

Summary#

This is the first general release of the NVIDIA NeMo Retriever Embedding NIM.

Embedding Models#

  • NV-EmbedQA-E5-v5

  • NV-EmbedQA-Mistral7B-v2

  • Snowflake’s Arctic-embed-l