Reference
You can download the complete API spec
NV-Embed-QA and E5 models
operate in passage
or query
mode, and thus require the input_type
parameter. passage
is used when generating
embeddings during indexing. query
is used when generating embeddings during querying. It is very important to use the correct input_type
.
Failure to do so will result in large drops in retrieval accuracy.
Since the OpenAI API does not accept input_type
as a parameter, it is possible to add the -query
or -passage
suffix to the model
parameter
like NV-Embed-QA-query
and not use the input_type
field at all for OpenAI API compliance.
For example, the following two requests are identical.
With the input_type
parameter:
curl -X "POST" \
"http://${HOSTNAME}:${SERVICE_PORT}/v1/embeddings" \
-H 'accept: application/json' \
-H 'Content-Type: application/json' \
-d '{
"input": ["What is the population of Pittsburgh?"],
"model": "nvidia/nv-embedqa-e5-v5",
"input_type": "query"
}'
Without the input_type
parameter with the -query
(or -passage
) in the model name:
curl -X "POST" \
"http://${HOSTNAME}:${SERVICE_PORT}/v1/embeddings" \
-H 'accept: application/json' \
-H 'Content-Type: application/json' \
-d '{
"input": ["What is the population of Pittsburgh?"],
"model": "nvidia/nv-embedqa-e5-v5-query"
}'
Note that the GTE and GTR models do not accept the input_type
parameter, since both the -query
and -passage
input types are processed in the same way.
Use the examples in this section to help you get started with using the API.
The complete API spec can be found at Open AI Spec
List Models
cURL Request
Use the following command to list the available models.
curl "http://${HOSTNAME}:${SERVICE_PORT}/v1/models" \
-H 'Accept: application/json'
Response
{
"object": "list",
"data": [
{
"id": "nvidia/nv-embedqa-e5-v5",
"created": 0,
"object": "model",
"owned_by": "organization-owner"
}
]
}
Generate Embeddings
cURL Request
curl -X "POST" \
"http://${HOSTNAME}:${SERVICE_PORT}/v1/embeddings" \
-H 'accept: application/json' \
-H 'Content-Type: application/json' \
-d '{
"input": ["Hello world"],
"model": "nvidia/nv-embedqa-e5-v5",
"input_type": "query"
}'
Response
{
"object": "list",
"data": [
{
"index": 0,
"embedding": [
0.0010356903076171875, -0.017669677734375,
// ...
-0.0178985595703125
],
"object": "embedding"
}
],
"model": "nvidia/nv-embedqa-e5-v5",
"usage": {
"prompt_tokens": 0,
"total_tokens": 0
}
}
For models that do not require the input_type
parameter, such as GTE or GTR, use the following sample API calls.
cURL Request
curl -X "POST" \
"http://${HOSTNAME}:${SERVICE_PORT}/v1/embeddings" \
-H 'accept: application/json' \
-H 'Content-Type: application/json' \
-d '{
"input": ["Hello world"],
"model": "nvidia/nv-embedqa-e5-v5",
}'
Response
{
"object": "list",
"data": [
{
"index": 0,
"embedding": [
0.0010356903076171875, -0.017669677734375,
// ...
-0.0178985595703125
],
"object": "embedding"
}
],
"model": "nvidia/nv-embedqa-e5-v5",
"usage": {
"prompt_tokens": 0,
"total_tokens": 0
}
}
Health Check
cURL Request
Use the following command to query the health endpoints.
curl "http://${HOSTNAME}:${SERVICE_PORT}/v1/health/ready" \
-H 'Accept: application/json'
curl "http://${HOSTNAME}:${SERVICE_PORT}/v1/health/live" \
-H 'Accept: application/json'
Response
{
"object": "health-response",
"message": "Service is ready."
}
{
"object": "health-response",
"message": "Service is live."
}