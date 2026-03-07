Is this page helpful?

Release Notes for NVIDIA NeMo Retriever Embedding NIM#

This documentation contains the release notes for NVIDIA NeMo Retriever Embedding NIM.

Release 1.13.0#

Highlights#

  • Rename existing models to the new Nemotron brand. The impacted models are the following:

    • The llama-3.2-nemoretriever-300m-embed-v2 model is now named llama-nemotron-embed-300m-v2.

    • The llama-3.2-nv-embedqa-1b-v2 model is now named llama-nemotron-embed-1b-v2.

  • Add fixes for high and critical vulnerabilities.

Fixed Known Issues#

The following are the known issues that are fixed in this version:

  • Fixed an issue with the persistence.enabled helm chart value. Persistent storage options (persistence.storageClass, persistence.existingClaim, hostPath.enabled) are now fully functional.

Release 1.12.0#

Highlights#

Known Issues#

  • The persistence.enabled value and all related dependent configuration flags are currently non-functional in the NIM helm chart.

Release 1.11 - Production Branch Only#

This release is a production branch.

Highlights#

Known Issues#

There are no known issues in this release.

Release 1.10.1#

This release is a patch release.

Highlights#

Known Issues#

  • The persistence.enabled value and all related dependent configuration flags are currently non-functional in the NIM helm chart.

Release 1.10.0#

Summary#

Known Issues#

  • The persistence.enabled value and all related dependent configuration flags are currently non-functional in the NIM helm chart.

Release 1.9.0#

Summary#

Known Issues#

  • The persistence.enabled value and all related dependent configuration flags are currently non-functional in the NIM helm chart.

Release 1.8 - Production Branch Only#

Summary#

  • 1.8.0: Added support for H200 NVL GPU for the NV-EmbedQA-E5-v5 NIM. For details, see NV-EmbedQA-E5-v5.

  • 1.8.0: Added FP8 support for H100 and L40s for the NV-EmbedQA-E5-v5 NIM. For details, see NV-EmbedQA-E5-v5.

  • 1.8.1 - 1.8.x: CVE fixes for high & critical vulnerabilities.

Release 1.7.0 - Early Access Only#

Summary#

Known Issues#

  • Currently, only unoptimized generic model profiles are supported.

Release 1.6.0#

Summary#

Known Issues#

  • The list-model-profiles command incorrectly lists compatible model profiles as incompatible. Select the profile that matches your hardware configuration. This bug does not impact automatic profile selection.

  • Slight performance degradation observed since 1.3.1 release.

  • For the B200 GPU, Llama-3.2-NV-EmbedQA-1B-v2 requires NIM_TRT_ENGINE_HOST_CODE_ALLOWED=1 to properly start the NIM.

Summary#

Release 1.5.1#

Summary#

  • Fixed bug where list-model-profiles command fails to run on hosts that don’t have an NVIDIA GPUs, even when NIM_CPU_ONLY is set.

  • Fixed bug where list-model-profiles command returns custom models that should not be used.

Known Issues#

  • The list-model-profiles command incorrectly lists compatible model profiles as incompatible. Select the profile that matches your hardware configuration. This bug does not impact automatic profile selection.

  • Slight performance degradation observed since 1.3.1 release.

Release 1.5.0#

Summary#

  • Added support for bge-m3 embedding model. For details, refer to Support Matrix.

  • Added support for bge-large-zh-v1.5 embedding model.

  • Added the NIM_TRITON_PERFORMANCE_MODE environment variable to allow you to select performance modes that are optimized for low latency or high throughput.

  • Added the NIM_TRITON_MAX_BATCH_SIZE environment variable.

  • Added support for configurable memory footprint by allowing users to set batch size and sequence length.

  • Added support for gRPC.

  • Reduced container image sizes.

  • Removed model profiles for A100 PCIe 40GB & H100 PCIe 80GB configurations.

Known Issues#

  • The list-model-profiles command incorrectly lists compatible model profiles as incompatible. Select the profile that matches your hardware configuration. This bug does not impact automatic profile selection.

  • The list-model-profiles command fails to run on hosts that don’t have an NVIDIA GPUs, even when NIM_CPU_ONLY is set.

  • The list-model-profiles command returns custom models that should not be used.

Release 1.4.0-rtx (Beta)#

Summary#

This is a public beta release of the NVIDIA NeMo Retriever Embedding NIM. This release contains the following changes:

  • Added support for GeForce RTX 4090, NVIDIA RTX 6000 Ada Generation, GeForce RTX 5080, and GeForce RTX 5090 for the Llama-3.2-NV-EmbedQA-1B-v2 NIM.

Known Issues#

  • The list-model-profiles command incorrectly lists compatible model profiles as incompatible. Select the profile that matches your hardware configuration. This bug does not impact automatic profile selection.

Release 1.3.1#

Release 1.3.0#

  • Added support for Llama-3.2-NV-EmbedQA-1B-v2 embedding model.

  • Added support for dynamic embedding sizes via Matryoshka Representation Learning (for supported models).

  • Added NIM_NUM_MODEL_INSTANCES and NIM_NUM_TOKENIZERS environment variables.

  • Added support for dynamic batching in the underlying Triton Inference Server process.

Known Issues#

  • The current version of langchain-nvidia-ai-endpoints used in the LangChain playbook is not compatible with the Llama-3.2-NV-EmbedQA-1B-v2 NIM.

Release 1.2.0#

  • Updated NV-EmbedQA-E5-v5 NIM to use Triton Inference Server 24.08.

  • Added the NIM_TRITON_GRPC_PORT env var to set gRPC port for Triton Inference Server.

Release 1.1.0#

  • Updated NV-EmbedQA-E5-v5 NIM using standard NIM library and tools.

Release 1.0.1#

  • Added support for NGC Personal/Service API keys in addition to the NGC API Key (Original).

  • NGC_API_KEY is no longer required when running a container with a pre-populated cache (NIM_CACHE_PATH).

  • list-model-profiles command updated to check the correct location for model artifacts.

Release 1.0.0#

Summary#

This is the first general release of the NVIDIA NeMo Retriever Embedding NIM.

Embedding Models#

  • NV-EmbedQA-E5-v5

  • NV-EmbedQA-Mistral7B-v2

  • Snowflake’s Arctic-embed-l