Text Embedding (Latest)
Microservices

Reference

You can download the complete API spec

Warning

Every model has a maximum token length. The models section lists the maximum token lengths of the supported models. See the truncate field in the Reference on ways to handle sequences longer than the maximum token length.

Warning

NV-Embed-QA and E5 models operate in passage or query mode, and thus require the input_type parameter. passage is used when generating embeddings during indexing. query is used when generating embeddings during querying. It is very important to use the correct input_type. Failure to do so will result in large drops in retrieval accuracy.

Since the OpenAI API does not accept input_type as a parameter, it is possible to add the -query or -passage suffix to the model parameter like NV-Embed-QA-query and not use the input_type field at all for OpenAI API compliance.

For example, the following two requests are identical.

With the input_type parameter:

Copy
Copied!
            

curl -X "POST" \ "http://${HOSTNAME}:${SERVICE_PORT}/v1/embeddings" \ -H 'accept: application/json' \ -H 'Content-Type: application/json' \ -d '{ "input": ["What is the population of Pittsburgh?"], "model": "nvidia/nv-embedqa-e5-v5", "input_type": "query" }'

Without the input_type parameter with the -query (or -passage) in the model name:

Copy
Copied!
            

curl -X "POST" \ "http://${HOSTNAME}:${SERVICE_PORT}/v1/embeddings" \ -H 'accept: application/json' \ -H 'Content-Type: application/json' \ -d '{ "input": ["What is the population of Pittsburgh?"], "model": "nvidia/nv-embedqa-e5-v5-query" }'

Note that the GTE and GTR models do not accept the input_type parameter, since both the -query and -passage input types are processed in the same way.

Use the examples in this section to help you get started with using the API.

The complete API spec can be found at Open AI Spec

List Models

cURL Request

Use the following command to list the available models.

Copy
Copied!
            

curl "http://${HOSTNAME}:${SERVICE_PORT}/v1/models" \ -H 'Accept: application/json'

Response

Copy
Copied!
            

{ "object": "list", "data": [ { "id": "nvidia/nv-embedqa-e5-v5", "created": 0, "object": "model", "owned_by": "organization-owner" } ] }

Generate Embeddings

cURL Request

Copy
Copied!
            

curl -X "POST" \ "http://${HOSTNAME}:${SERVICE_PORT}/v1/embeddings" \ -H 'accept: application/json' \ -H 'Content-Type: application/json' \ -d '{ "input": ["Hello world"], "model": "nvidia/nv-embedqa-e5-v5", "input_type": "query" }'

Response

Copy
Copied!
            

{ "object": "list", "data": [ { "index": 0, "embedding": [ 0.0010356903076171875, -0.017669677734375, // ... -0.0178985595703125 ], "object": "embedding" } ], "model": "nvidia/nv-embedqa-e5-v5", "usage": { "prompt_tokens": 0, "total_tokens": 0 } }

For models that do not require the input_type parameter, such as GTE or GTR, use the following sample API calls. cURL Request

Copy
Copied!
            

curl -X "POST" \ "http://${HOSTNAME}:${SERVICE_PORT}/v1/embeddings" \ -H 'accept: application/json' \ -H 'Content-Type: application/json' \ -d '{ "input": ["Hello world"], "model": "nvidia/nv-embedqa-e5-v5", }'

Response

Copy
Copied!
            

{ "object": "list", "data": [ { "index": 0, "embedding": [ 0.0010356903076171875, -0.017669677734375, // ... -0.0178985595703125 ], "object": "embedding" } ], "model": "nvidia/nv-embedqa-e5-v5", "usage": { "prompt_tokens": 0, "total_tokens": 0 } }

Health Check

cURL Request

Use the following command to query the health endpoints.

Copy
Copied!
            

curl "http://${HOSTNAME}:${SERVICE_PORT}/v1/health/ready" \ -H 'Accept: application/json'

Copy
Copied!
            

curl "http://${HOSTNAME}:${SERVICE_PORT}/v1/health/live" \ -H 'Accept: application/json'

Response

Copy
Copied!
            

{ "object": "health-response", "message": "Service is ready." }

Copy
Copied!
            

{ "object": "health-response", "message": "Service is live." }

Previous Performance
Next Acknowledgements
© Copyright © 2024, NVIDIA Corporation. Last updated on Jul 23, 2024.