Embedders in NVIDIA NeMo Agent Toolkit#

An embedder, or embedding model, is a model that transforms diverse data, such as text, images, charts, and video, into numerical vectors in a way that captures their meaning and nuance in a multidimensional vector space.

Supported Embedder Providers#

NeMo Agent Toolkit supports the following embedder providers:

Provider

Type

Description

NVIDIA NIM

nim

NVIDIA Inference Microservice (NIM)

OpenAI

openai

OpenAI API

Azure OpenAI

azure_openai

Azure OpenAI API

Hugging Face

huggingface

Local sentence-transformers or remote Inference Endpoints (TEI)

Embedder Configuration#

The embedder configuration is defined in the embedders section of the workflow configuration file. The _type value refers to the embedder provider, and the model_name value always refers to the name of the model to use.

embedders:
  nim_embedder:
    _type: nim
    model_name: nvidia/nv-embedqa-e5-v5
  openai_embedder:
    _type: openai
    model_name: text-embedding-3-small
  azure_openai_embedder:
    _type: azure_openai
    azure_deployment: text-embedding-3-small

NVIDIA NIM#

You can use the following environment variables to configure the NVIDIA NIM embedder provider:

  • NVIDIA_API_KEY - The API key to access NVIDIA NIM resources

The NIM embedder provider is defined by the NIMEmbedderModelConfig class.

  • model_name - The name of the model to use

  • api_key - The API key to use for the model

  • base_url - The base URL to use for the model

  • max_retries - The maximum number of retries for the request

  • truncate - The truncation strategy to use for the model

OpenAI#

You can use the following environment variables to configure the OpenAI embedder provider:

  • OPENAI_API_KEY - The API key to access OpenAI resources

The OpenAI embedder provider is defined by the OpenAIEmbedderModelConfig class.

  • model_name - The name of the model to use

  • api_key - The API key to use for the model

  • base_url - The base URL to use for the model

  • max_retries - The maximum number of retries for the request

Azure OpenAI#

You can use the following environment variables to configure the Azure OpenAI embedder provider:

  • AZURE_OPENAI_API_KEY - The API key to access Azure OpenAI resources

  • AZURE_OPENAI_ENDPOINT - The Azure OpenAI endpoint to access Azure OpenAI resources

The Azure OpenAI embedder provider is defined by the AzureOpenAIEmbedderModelConfig class.

  • api_key - The API key to use for the model

  • api_version - The API version to use for the model

  • azure_endpoint - The Azure OpenAI endpoint to use for the model

  • azure_deployment - The name of the Azure OpenAI deployment to use

Hugging Face#

Hugging Face is an embedder provider that supports both local sentence-transformers models and remote TEI servers or Hugging Face Inference Endpoints. When endpoint_url is provided, embeddings are generated remotely. Otherwise, models are loaded and run locally.

You can use the following environment variables to configure the Hugging Face embedder provider:

  • HF_TOKEN - The API token to access Hugging Face Inference resources

The Hugging Face embedder provider is defined by the HuggingFaceEmbedderConfig class.

  • model_name - The Hugging Face model identifier (for example, BAAI/bge-large-en-v1.5). Required for local embeddings

  • endpoint_url - Endpoint URL for TEI server or Hugging Face Inference Endpoint. When set, uses remote embedding

  • api_key - The Hugging Face API token for authentication

  • timeout - Request timeout in seconds (default: 120.0)

  • device - Device for local models: cpu, cuda, mps, or auto (default: auto)

  • normalize_embeddings - Whether to normalize embeddings to unit length (default: true)

  • batch_size - Batch size for embedding generation (default: 32)

  • max_seq_length - Maximum sequence length for input text

  • trust_remote_code - Whether to trust remote code when loading models (default: false)

embedders:
  # Local sentence-transformers embedder
  local_embedder:
    _type: huggingface
    model_name: sentence-transformers/all-MiniLM-L6-v2
    device: auto
    normalize_embeddings: true

  # Remote TEI or Inference Endpoint embedder
  tei_embedder:
    _type: huggingface
    endpoint_url: http://localhost:8081
    api_key: ${HF_TOKEN}