API Reference (gRPC) for NVIDIA NeMo Retriever Embedding NIM#

This documentation contains the gRPC reference for NVIDIA NeMo Retriever Embedding NIM.

gRPC Models#

The gRPC model names differ from the NIM model IDs shown in the Support Matrix. The following table contains the mapping of the names.

Model ID

gRPC Model Name

nvidia/llama-nemotron-embed-1b-v2

nvidia_llama_nemotron_embed_1b_v2

nvidia/nv-embedqa-e5-v5

nvidia_nv_embedqa_e5_v5

Request Inputs#

Input

Shape

Data Type

Description

Required

text

[batch_size, 1]

BYTES

A list of UTF-8 encoded strings to embed. For details on how to encode multimodal data as string, refer to How to Specify Modality.

Yes

modality

[batch_size, 1]

BYTES

A list of UTF-8 modality strings for each of the text input elements. If you don’t specify modality, the modality is inferred. For supported modalities, refer to How to Specify Modality.

No

Request Parameters#

Parameter

Data Type

Description

Valid Values

Default

Required

input_type

String

The context of the embedding.

"query", "passage"

"query"

Yes

truncate

String

How to handle text that exceeds the maximum token length.

"END", "START", "NONE"

"NONE"

Yes

dimensions

Integer

The desired dimensionality of the output embeddings. Must be supported by the model.

The model’s default dimension.

No

embedding_type

String

The output type of the embeddings. See How to Specify Embedding Type for how the output type is handled.

"float", "binary", "ubinary", "int8", "uint8"

"float"

No

nvcf_asset_dir

String

Directory path where NVCF (NVIDIA Cloud Functions) asset files are stored. See API Reference (OpenAI) for more details.

-

No

Response#

Output

Shape

Data Type

Description

token_count

[batch_size]

INT32

The number of tokens in each input text.

embeddings

[batch_size, embedding_dimension]

Configurable using the embedding_type request parameter. The default is FLOAT32.

The resulting embedding vectors.