Release Notes#

Release 1.3.0#

  • Added support for Llama-3.2-NV-EmbedQA-1B-v2 embedding model.

  • Added support for dynamic embedding sizes via Matryoshka Representation Learning (for supported models).

  • Added NIM_NUM_MODEL_INSTANCES and NIM_NUM_TOKENIZERS environment variables.

  • Added support for dynamic batching in the underlying Triton Inference Server process.

Known Issues#

  • The current version of langchain-nvidia-ai-endpoints used in the LangChain playbook is not compatible with the Llama-3.2-NV-EmbedQA-1B-v2 NIM.

Release 1.2.0#

  • Updated NV-EmbedQA-E5-v5 NIM to use Triton Inference Server 24.08.

  • Added the NIM_TRITON_GRPC_PORT env var to set gRPC port for Triton Inference Server.

Release 1.1.0#

  • Updated NV-EmbedQA-E5-v5 NIM using standard NIM library and tools.

Release 1.0.1#

  • Added support for NGC Personal/Service API keys in addition to the NGC API Key (Original).

  • NGC_API_KEY is no longer required when running a container with a pre-populated cache (NIM_CACHE_PATH).

  • list-model-profiles command updated to check the correct location for model artifacts.

Release 1.0.0#

Summary#

This is the first general release of the NeMo Retriever Text Embedding NIM.

Embedding Models#

  • NV-EmbedQA-E5-v5

  • NV-EmbedQA-Mistral7B-v2

  • Snowflake’s Arctic-embed-l