Embeddings#

Generate embeddings by using the Text Embedding NIM microservices through the NIM Proxy microservice.

The following are examples of the Text Embedding NIM microservices you can use with the NIM Proxy microservice:

Prerequisites#

Before you start, make sure that you have:

  • Access to the NIM Proxy microservice through the base URL where the service is deployed. Store the base URL in an environment variable NIM_PROXY_BASE_URL.

  • A valid embedding model name. To retrieve the list of models deployed as NIM microservices in your environment and exposed through the NIM Proxy microservice, use the ${NIM_PROXY_BASE_URL}/v1/models API. For more information, see List Models.

To Generate Embeddings#

Choose one of the following options of generating embeddings.

Create a NeMoMicroservices client instance using the base URL of the NIM Proxy microservice and perform the task as follows.

from nemo_microservices import NeMoMicroservices

client = NeMoMicroservices(
    nemo_base_url=os.environ["NEMO_BASE_URL"],
    inference_base_url=os.environ["NIM_PROXY_BASE_URL"]
)
response = client.embeddings.create(
    input="This is some text to be embedded.",
    model="nvidia/nv-embedqa-e5-v5",
    input_type="passage",
    encoding_format="float",
    user="user-identifier",
    truncate="NONE"
)

print(response)

Use the following cURL command. For more details on the request body, refer to the NIM for LLMs API reference and find the API named the same as v1/embeddings. The NIM Proxy API endpoint routes your requests to the NIM for LLMs microservice.

curl -X 'POST' \
    '${NIM_PROXY_BASE_URL}/v1/embeddings' \
    -H 'accept: application/json' \
    -H 'Content-Type: application/json' \
    -d '{
        "input": "This is some text to be embedded.",
        "model": "nvidia/nv-embedqa-e5-v5",
        "input_type": "passage",
        "encoding_format": "float",
        "user": "user-identifier",
        "truncate": "NONE"
    }'
Example Response
{
    "object": "list",
    "data": [
        {
        "index": 8191,
        "embedding": [
            0
        ],
        "object": "embedding"
        }
    ],
    "model": "nvidia/nv-embedqa-e5-v5",
    "usage": {
        "prompt_tokens": 0,
        "total_tokens": 0
    }
}