Release Notes#
Release 1.3.0#
Added support for Llama-3.2-NV-EmbedQA-1B-v2 embedding model.
Added support for dynamic embedding sizes via Matryoshka Representation Learning (for supported models).
Added
NIM_NUM_MODEL_INSTANCES
andNIM_NUM_TOKENIZERS
environment variables.Added support for dynamic batching in the underlying Triton Inference Server process.
Known Issues#
The current version of
langchain-nvidia-ai-endpoints
used in the LangChain playbook is not compatible with the Llama-3.2-NV-EmbedQA-1B-v2 NIM.
Release 1.2.0#
Updated NV-EmbedQA-E5-v5 NIM to use Triton Inference Server 24.08.
Added the NIM_TRITON_GRPC_PORT env var to set gRPC port for Triton Inference Server.
Release 1.1.0#
Updated NV-EmbedQA-E5-v5 NIM using standard NIM library and tools.
Release 1.0.1#
Added support for NGC Personal/Service API keys in addition to the NGC API Key (Original).
NGC_API_KEY
is no longer required when running a container with a pre-populated cache (NIM_CACHE_PATH
).list-model-profiles
command updated to check the correct location for model artifacts.
Release 1.0.0#
Summary#
This is the first general release of the NeMo Retriever Text Embedding NIM.
Embedding Models#
NV-EmbedQA-E5-v5
NV-EmbedQA-Mistral7B-v2
Snowflake’s Arctic-embed-l