Release Notes for NeMo Retriever Text Embedding NIM#
This documentation contains the release notes for NeMo Retriever Text Embedding NIM.
Release 1.7.0 - Early Access Only#
Summary#
Added support for Llama-nemoretriever-vlm-embedqa-1b model. For details, see Support Matrix for NeMo Retriever Text Embedding NIM.
Added new
modality
field to the/v1/embeddings
endpoint to support text, image, and mixed (text+image) input types. For details, see Modality.
Known Issues#
Currently, there are no known issues in release 1.7.0.
Release 1.6.0#
Summary#
Added support for B200 GPU. For details, see Support Matrix for NeMo Retriever Text Embedding NIM.
Known Issues#
The
list-model-profiles
command incorrectly lists compatible model profiles as incompatible. Select the profile that matches your hardware configuration. This bug does not impact automatic profile selection.Slight performance degradation observed since 1.3.1 release.
For the B200 GPU, Llama-3.2-NV-EmbedQA-1B-v2 requires
NIM_TRT_ENGINE_HOST_CODE_ALLOWED=1
to properly start the NIM.
Summary#
Release 1.5.1#
Summary#
Fixed bug where
list-model-profiles
command fails to run on hosts that don’t have an NVIDIA GPUs, even whenNIM_CPU_ONLY
is set.Fixed bug where
list-model-profiles
command returnscustom
models that should not be used.
Known Issues#
The
list-model-profiles
command incorrectly lists compatible model profiles as incompatible. Select the profile that matches your hardware configuration. This bug does not impact automatic profile selection.Slight performance degradation observed since 1.3.1 release.
Release 1.5.0#
Summary#
Added support for bge-m3 embedding model. For details, refer to Support Matrix.
Added support for bge-large-zh-v1.5 embedding model.
Added the
NIM_TRITON_PERFORMANCE_MODE
environment variable to allow you to select performance modes that are optimized for low latency or high throughput.Added the
NIM_TRITON_MAX_BATCH_SIZE
environment variable.Added support for configurable memory footprint by allowing users to set batch size and sequence length.
Added support for gRPC.
Reduced container image sizes.
Removed model profiles for A100 PCIe 40GB & H100 PCIe 80GB configurations.
Known Issues#
The
list-model-profiles
command incorrectly lists compatible model profiles as incompatible. Select the profile that matches your hardware configuration. This bug does not impact automatic profile selection.The
list-model-profiles
command fails to run on hosts that don’t have an NVIDIA GPUs, even whenNIM_CPU_ONLY
is set.The
list-model-profiles
command returnscustom
models that should not be used.
Release 1.4.0-rtx (Beta)#
Summary#
This is a public beta release of the NVIDIA NIM for large language models (LLMs). This release contains the following changes:
Added support for GeForce RTX 4090, NVIDIA RTX 6000 Ada Generation, GeForce RTX 5080, and GeForce RTX 5090 for the Llama-3.2-NV-EmbedQA-1B-v2 NIM.
Known Issues#
The
list-model-profiles
command incorrectly lists compatible model profiles as incompatible. Select the profile that matches your hardware configuration. This bug does not impact automatic profile selection.
Release 1.3.1#
Added the
NIM_SERVED_MODEL_NAME
environment variable.Updated the LangChain Playbook to use the Llama-3.2-NV-EmbedQA-1B-v2 NIM.
Release 1.3.0#
Added support for Llama-3.2-NV-EmbedQA-1B-v2 embedding model.
Added support for dynamic embedding sizes via Matryoshka Representation Learning (for supported models).
Added
NIM_NUM_MODEL_INSTANCES
andNIM_NUM_TOKENIZERS
environment variables.Added support for dynamic batching in the underlying Triton Inference Server process.
Known Issues#
The current version of
langchain-nvidia-ai-endpoints
used in the LangChain playbook is not compatible with the Llama-3.2-NV-EmbedQA-1B-v2 NIM.
Release 1.2.0#
Updated NV-EmbedQA-E5-v5 NIM to use Triton Inference Server 24.08.
Added the NIM_TRITON_GRPC_PORT env var to set gRPC port for Triton Inference Server.
Release 1.1.0#
Updated NV-EmbedQA-E5-v5 NIM using standard NIM library and tools.
Release 1.0.1#
Added support for NGC Personal/Service API keys in addition to the NGC API Key (Original).
NGC_API_KEY
is no longer required when running a container with a pre-populated cache (NIM_CACHE_PATH
).list-model-profiles
command updated to check the correct location for model artifacts.
Release 1.0.0#
Summary#
This is the first general release of the NeMo Retriever Text Embedding NIM.
Embedding Models#
NV-EmbedQA-E5-v5
NV-EmbedQA-Mistral7B-v2
Snowflake’s Arctic-embed-l